Trailing-Edge
-
PDP-10 Archives
-
tops10v704_docc
-
10,7/docupd/cag.mem
There is 1 other file named cag.mem in the archive. Click here to see a list.
TOPS-10 Crash Analysis Guide
| Electronically Distributed
|
|
|
| This guide provides methods for analyzing TOPS-10
| system crashes. It describes the tools that can
| be useful in the process of diagnosing the cause
| of system failure, and suggests methods of solving
| the problem that caused the failure. This book is
| intended to be used by experienced TOPS-10 system
| programmers and assumes that the reader has
| adequate system privileges to complete the
| procedures presented.
|
| This guide supercedes the TOPS-10 Crash Analysis
| Guide published in January, 1989. The order
| number for that guide, AA-H206D-TB, is obsolete.
Operating System: TOPS-10 Version 7.04
digital equipment corporation maynard, massachusetts
| TOPS-10 Update Tape No. 03, September 1990
First printing, November 1978
Revised, August 1980
Revised, April 1986
Revised, January 1989
The information in this document is subject to change without notice
and should not be construed as a commitment by Digital Equipment
Corporation. Digital Equipment Corporation assumes no responsibility
for any errors that may appear in this document.
The software described in this document is furnished under a license
and may only be used or copied in accordance with the terms of such
license.
No responsibility is assumed for the user or reliability of software
on equipment that is not supplied by DIGITAL or its affiliated
companies.
| Copyright C 1978, 1980, 1986, 1989, 1990 by Digital Equipment
| Corporation
All Rights Reserved.
The following are trademarks of Digital Equipment Corporation:
CI DECtape LA50 SITGO-10
DDCMP DECUS LN01 TOPS-10
DEC DECwriter LN03 TOPS-20
DECmail DELNI MASSBUS TOPS-20AN
DECnet DELUA PDP UNIBUS
DECnet-VAX HSC PDP-11/24 UETP
DECserver HSC-50 PrintServer VAX
DECserver 100 KA10 PrintServer 40 VAX/VMS
DECserver 200 KI Q-bus VT50
DECsystem-10 KL10 ReGIS
DECSYSTEM-20 KS10 RSX d i g i t a l
CONTENTS
PREFACE
CHAPTER 1 INTRODUCTION
1.1 SYSTEM ERROR RECOVERY . . . . . . . . . . . . . . 1-i
1.2 TYPES OF ERRORS . . . . . . . . . . . . . . . . . 1-3
1.3 CRASH ANALYSIS TOOLS . . . . . . . . . . . . . . . 1-4
1.4 CRASH ANALYSIS PROCEDURE . . . . . . . . . . . . . 1-5
CHAPTER 2 EXAMINING A CRASH FILE
2.1 CREATING A CRASH FILE . . . . . . . . . . . . . . 2-1
2.2 USING FILDDT . . . . . . . . . . . . . . . . . . . 2-3
2.3 ESTABLISHING PROPER MAPPING . . . . . . . . . . . 2-5
2.3.1 FILDDT Mapping Instructions . . . . . . . . . . 2-5
2.3.2 Mapping the Crash . . . . . . . . . . . . . . . 2-6
2.4 VERIFYING THE DUMP . . . . . . . . . . . . . . . . 2-9
2.5 FILDDT COMMAND FILES . . . . . . . . . . . . . . . 2-9
2.6 STOPCODE INFORMATION . . . . . . . . . . . . . . 2-12
CHAPTER 3 LOCATING THE FAILURE
3.1 HARDWARE MAPPING . . . . . . . . . . . . . . . . . 3-2
3.2 PAGING POINTERS . . . . . . . . . . . . . . . . . 3-2
3.3 EXTENDED ADDRESSING . . . . . . . . . . . . . . . 3-3
3.4 MONITOR-RESIDENT USER DATA . . . . . . . . . . . . 3-3
3.5 PROGRAM COUNTER WORD . . . . . . . . . . . . . . . 3-4
3.6 PROCESSOR MODES . . . . . . . . . . . . . . . . . 3-5
3.6.1 User Mode . . . . . . . . . . . . . . . . . . . 3-5
3.6.2 Exec Mode . . . . . . . . . . . . . . . . . . . 3-6
3.7 THE PRIORITY INTERRUPT SYSTEM . . . . . . . . . . 3-7
3.8 THE DEVICE INTERRUPT SERVICE . . . . . . . . . . . 3-9
3.8.1 Standard Interrupts . . . . . . . . . . . . . . 3-9
3.8.2 Vectored Interrupts . . . . . . . . . . . . . 3-12
3.9 TRAPS . . . . . . . . . . . . . . . . . . . . . 3-12
3.9.1 Page Fail Traps . . . . . . . . . . . . . . . 3-12
3.10 CLOCK LEVEL . . . . . . . . . . . . . . . . . . 3-14
3.11 ACCUMULATORS AND PUSH-DOWN LISTS . . . . . . . . 3-15
3.12 MONITOR ORGANIZATION . . . . . . . . . . . . . . 3-16
3.12.1 Monitor Startup Modules . . . . . . . . . . . 3-17
3.12.2 Symbol Definition Modules . . . . . . . . . . 3-18
3.13 EXAMPLES OF LOCATING FAILURES . . . . . . . . . 3-18
iii
CHAPTER 4 EXAMINING THE DATA STRUCTURES
4.1 SYMBOLS . . . . . . . . . . . . . . . . . . . . . 4-1
4.1.1 Naming Conventions . . . . . . . . . . . . . . . 4-1
4.1.2 Symbol Files and Monitor Generation . . . . . . 4-5
4.2 READING THE CODE . . . . . . . . . . . . . . . . . 4-5
4.2.1 How to Use a CREF Listing . . . . . . . . . . . 4-6
4.2.2 Macros . . . . . . . . . . . . . . . . . . . . . 4-6
4.2.3 Conditional Assembly . . . . . . . . . . . . . . 4-7
4.2.4 Finding Symbols . . . . . . . . . . . . . . . . 4-7
4.3 JOB-RELATED DATA STRUCTURES . . . . . . . . . . . 4-8
4.4 CPU DATA STRUCTURES . . . . . . . . . . . . . . 4-10
4.5 MEMORY DATA STRUCTURES . . . . . . . . . . . . . 4-12
4.6 COMMAND PROCESSING TABLES . . . . . . . . . . . 4-12
4.7 UUO PROCESSING TABLES . . . . . . . . . . . . . 4-13
4.8 I/O DATA STRUCTURES . . . . . . . . . . . . . . 4-13
4.9 THE JOB DEVICE ASSIGNMENT TABLE . . . . . . . . 4-13
4.10 THE DEVICE DATA BLOCK . . . . . . . . . . . . . 4-14
4.11 FINDING DDB INFORMATION . . . . . . . . . . . . 4-15
4.12 LINE DATA BLOCKS (LDBS) . . . . . . . . . . . . 4-18
4.13 THE SCNSER DATA BASE . . . . . . . . . . . . . . 4-19
4.14 TERMINAL CHUNKS . . . . . . . . . . . . . . . . 4-19
4.15 TERMINAL DEVICE DATA BLOCKS . . . . . . . . . . 4-20
4.16 FINDING TERMINAL I/O INFORMATION . . . . . . . . 4-21
4.17 TAPE DRIVES . . . . . . . . . . . . . . . . . . 4-23
4.18 DISKS . . . . . . . . . . . . . . . . . . . . . 4-24
4.18.1 Finding Information on Disk . . . . . . . . . 4-27
4.18.2 In-Core File Information . . . . . . . . . . . 4-31
4.18.3 The Software Disk Cache . . . . . . . . . . . 4-34
4.18.4 Finding In-Core File Information . . . . . . . 4-35
CHAPTER 5 ERROR HANDLING ROUTINES
5.1 HARDWARE ERRORS . . . . . . . . . . . . . . . . . 5-1
5.1.1 APR Interrupt Routine . . . . . . . . . . . . . 5-4
5.1.2 Page Fail Trap Routine . . . . . . . . . . . . . 5-4
5.1.3 Saved Hardware Error Information . . . . . . . . 5-5
5.1.4 Hardware Error Checking . . . . . . . . . . . . 5-6
5.2 STOPCODES . . . . . . . . . . . . . . . . . . . 5-10
5.2.1 Stopcode Processing . . . . . . . . . . . . . 5-12
5.2.2 Continuing from Stopcodes . . . . . . . . . . 5-13
5.2.3 Special Stopcodes . . . . . . . . . . . . . . 5-13
5.3 ERRORS DETECTED BY RSX-20F . . . . . . . . . . . 5-15
CHAPTER 6 DEBUGGING THE MONITOR
6.1 PATCHING WITH FILDDT . . . . . . . . . . . . . . . 6-1
6.2 USING EDDT . . . . . . . . . . . . . . . . . . . . 6-2
6.2.1 Starting the Monitor . . . . . . . . . . . . . . 6-2
6.2.2 Breakpoints . . . . . . . . . . . . . . . . . . 6-3
iv
6.3 DEBUGF FLAGS . . . . . . . . . . . . . . . . . . . 6-3
6.4 MULTI-CPU ENVIRONMENT . . . . . . . . . . . . . . 6-4
6.5 CAUTIONS . . . . . . . . . . . . . . . . . . . . . 6-5
APPENDIX A ADDRESS SPACE LAYOUT
GLOSSARY
INDEX
FIGURES
A-1 Monitor Code Section Layout . . . . . . . . . . . A-2
A-2 DECnet Code Section Layout . . . . . . . . . . . . A-3
A-3 Monitor Data Section 3 Layout . . . . . . . . . . A-4
A-4 Monitor Data Sections 4,5 Layout . . . . . . . . . A-5
A-5 Monitor Data Sections 6,7 Layout . . . . . . . . . A-6
A-6 Monitor Data Sections 35,36,37 Layout . . . . . . A-7
TABLES
3-1 Interrupt Level Indicators . . . . . . . . . . . . 3-8
4-1 Monitor Accumulators . . . . . . . . . . . . . . . 4-2
5-1 Hardware Errors . . . . . . . . . . . . . . . . . 5-9
Gloss-1 Glossary of Acronyms . . . . . . . . . . . . . Gloss-1
vi
PREFACE
The TOPS-10 Crash Analysis Guide is a procedural and reference manual
that you can use to diagnose the causes of TOPS-10 system failures and
to correct these problems.
The TOPS-10 Software Notebook Set contains several documents that you
should use while analyzing system crashes. In particular, you will
find the TOPS-10 Monitor Tables Descriptions and the Stopcodes
Specification are most important for symbol definitions, and the
TOPS-10 DDT Manual is a useful reference for the debugging tools used
in the procedures.
Before you can reliably diagnose and repair system problems, you must
be able to use DDT commands to examine and patch the TOPS-10 monitor
modules. You must also be familiar with any local modifications that
have been made to the monitor.
There are a few symbols shown in this manual that indicate special
characters. They are:
Character Meaning
^\ <Control-backslash> is the character to type on
the CTY to get the attention of the parser.
$ The ESCape character, or altmode, is used in
commands to DDT and TECO.
<CTRL/Z> This control character is used to terminate a
TOPS-10 process, such as DDT. It is displayed as
^Z.
vii
1-i
CHAPTER 1
INTRODUCTION
Crash analysis is used in the process of solving system problems. You
can analyze a crash by examining a copy of memory that is stored in a
crash file when the operating system stops running. There are
different methods of analyzing different types of system problems. It
may be helpful, for example, to isolate the cause of a problem as
either the hardware or the software on a preliminary investigation,
but it is important to understand and recognize all symptoms of system
problems, including those involving the interaction of both hardware
and software.
This manual describes methods that you can apply to various system
problems. As you become more familiar with the monitor and the tools
you use to debug the system, you will be able to customize these
methods to your own needs.
1.1 SYSTEM ERROR RECOVERY
To successfully analyze different types of system problems, you should
try to view the system as a whole, investigating hardware status and
software conditions, as well as the interaction of the two. You can
use many informational tools to detect and correct system problems:
hardware diagnostics verify the hardware state of the machine, and
software test packages verify the performance and validity of software
components. The monitor itself is an excellent test program for both
hardware and software. It prints and saves information about the
problems it encounters on the console terminal (CTY). Each CPU in a
multiple-CPU configuration has a CTY, where it prints information
about the stopcodes it encounters, messages for the operator, and a
log of system events.
The TOPS-10 monitor and hardware systems are designed to prevent the
system from crashing when a minor error is encountered. Timesharing
is only interrupted by an unrecoverable, or fatal error. Most system
problems are not fatal, and in most cases system operation continues
normally.
1-1
INTRODUCTION
A hardware or software error that prevents normal timesharing
operation causes a crash; that is, the system performs certain error
recovery operations, terminates all user and system jobs, and restarts
operation with a fresh database. If a hardware or software error is
serious enough to warrant this procedure, the system is halted and a
copy of memory is written to disk (or dumped) before the system is
reloaded. This copy of memory, called the crash file, is useful
because the system uses this file to record the contents of many
registers and data structures. This manual describes how to examine
the crash file to find information that might indicate the reason for
the crash.
Not all hardware and software errors cause the system to crash. The
software is equipped with a number of special error recovery
procedures to continue operation after a system or user error. The
software generates a stopcode, which provide the system manager with
information about the cause of the error, and lists system modules and
data locations useful in analyzing the source of the stopcode. This
information is printed on the system's CTY to inform the operator of
the status of the system. A continuable stopcode does not cause a
system reload or halt, but, in most cases, produces a crash file.
A system error that causes a crash, like a program error that causes a
halt, is called a fatal error, because all the jobs on the system must
be halted and restarted. The system records as much information as
possible before the crash. However, in the act of reloading memory or
processing a hardware error, the operating system may lose or
overwrite applicable data locations, and a certain amount of
information may be lost. In every crash, it is important to be aware
that information recorded during the crash may be invalid or
corrupted.
The way the monitor processes the error depends on the type of failure
that occurred. The method you use to analyze the crash depends on the
type of information that the monitor saved before the crash. This
manual is organized to provide crash analysis information for
different types of crashes. Remember that this manual can only
explain ideal and general situations. As the system analyst, you
should be familiar with the specific aspects of the system you are
analyzing, because you may face unique problems at your site. If
possible, review the system build procedure, especially the
information about hardware and software configuration. This type of
information is described in the TOPS-10 Software Installation Guide.
DIGITAL provides software error reporting and revision services for
problems you cannot solve. If you cannot solve a problem that
prevents system operation, submit a Software Performance Report (SPR)
through your DIGITAL Service Representative. Be sure to include all
the information required to analyze a system crash. This manual
describes that information.
1-2
INTRODUCTION
1.2 TYPES OF ERRORS
The hardware and software handle each type of system problem
differently. Most problems do not result in a crash; many errors are
handled locally for a specific program or device, without affecting
the entire system. For example, TOPS-10 is designed so that
unprivileged user jobs cannot directly crash the system. If a user
program develops a fatal error, the monitor aborts the program without
affecting the other users on the system. If the monitor data base
entries for a particular user job are destroyed, the monitor tries to
eliminate the job without affecting other jobs. However, changes to
system-wide variables such as those affecting memory and CPU usage may
cause the system to crash.
In almost all cases, the software detects and handles errors by
gathering information and taking corrective action. In the case of a
fatal error, the system reloads automatically. Fault continuation
allows the system to correct certain types of errors and continue
operation without affecting the execution of user programs. In most
cases, corrective action affects only the process at fault. Such
action might include repeating an I/O operation or stopping exection
of a user job.
Fault continuation allows the system and user jobs to continue with
little or no interruption, but continuable stopcodes are recorded on
the CTY for later examination. It is important to be aware of all
previous errors in the process of analyzing a crash, even those that
did not directly cause the system to crash. Internal discrepancies
that corrupt an important data structure may in turn affect other
routines, and the error propagates, or the software goes into an
infinite loop.
Crash files and CTY listings are the main sources of information about
the system before the time of the crash. However, error recovery code
can contain errors of its own. The history of a crash, including data
from the time leading up to the crash, is an important source of
information in these situations.
When the system crashes, you must be prepared to verify that the
system actually crashed, and determine the extent to which the
software was affected. You must isolate the problem that caused the
error by defining the point in the code where the error was detected,
then identify the problem that caused the error condition, record that
information, and correct the problem if possible.
This procedure, and the tools you will need to analyze crashes, are
described in the following chapters. Remember that your success in
these areas depends on many factors, and that it may not be possible
to correct the error immediately. It is more important to continue
system operation as soon as possible. Later, you can address the
crash using the tools described in this manual.
1-3
INTRODUCTION
1.3 CRASH ANALYSIS TOOLS
To analyze a system crash, you need several sources of information,
and you must use system programs to examine the information. You must
use all your knowledge of the DECsystem-10 and the TOPS-10 monitor, as
well as the GALAXY system, ANF-10 network communications, and all
other software running on the system. The specific sources of
information about a system crash are:
o The CTY output for the time before the crash
o The crash file
o Listings or microfiche of the monitor sources, describing the
algorithms, data structures, symbols, and bit definitions
o The operator log book
o The Monitor Tables descriptions from the TOPS-10 Software
Notebook Set
You will use the following tools in analyzing system crashes.
o FILDDT (File DDT) allows you to examine files or the running
monitor. Sections 2.3 though 2.4 describe FILDDT.
o EDDT (Exec DDT) allows you to examine, breakpoint, and patch
the running monitor. Section 6.2 describes EDDT.
o CRSCPY copies crash files and stores information about them
in a database. The TOPS-10 Operator's Guide describes
CRSCPY.
o SPEAR creates reports, based on the system error log file
(ERROR.SYS), which are useful for tracing non-fatal errors
that may have led to the system crash. Refer to the
TOPS-10/20 SPEAR Reference Manual for more information about
SPEAR.
o OPR, the operator interface to the DECsystem-10, provides
commands that allow you to change the system configuration
and to control software processes. Refer to the TOPS-10
Operator's Command Language Reference Manual for more
information about this program.
You will also need to use a text editor such as TECO to patch the
monitor sources or system startup files after you have solved a
software problem.
1-4
INTRODUCTION
1.4 CRASH ANALYSIS PROCEDURE
To isolate a system problem, you must use FILDDT to examine the crash
file. The crash file records the state of the system at the time of
the crash, including information you can use to determine the cause of
the crash, such as:
o Processor mode (user, user I/O, or exec mode)
o Stack pointer and stack in use
o Contents of accumulators
o Stopcode information
First you must obtain the crash file. In Chapter 2, you will learn
how the monitor creates and maintains crash files. Chapter 2 also
contains procedures for loading the monitor symbols for FILDDT and
using the symbolic FILDDT to examine a crash file and extract the
information listed above.
Chapter 3 explains how to interpret the information you obtain from
the crash file, to determine the state of the system at the time of
the crash.
Chapter 3 contains a discussion of processor modes, job scheduling,
and the priority levels that the monitor uses in timesharing, and how
the information from the crash file can point to the faulty code that
caused the crash.
After you have determined the monitor process that failed, you can
begin to investigate the crash file for the actual routine that
failed. Chapter 4 contains a description of the monitor's data
structures and how to obtain information about them from the crash
file and the source code.
The monitor may crash, or hang without crashing, because an error has
occurred in the error handling and recovery procedures. Chapter 5
contains descriptions of the the system error recovery routines.
Continuable stopcodes are described in more detail. You can use the
information in this chapter to determine whether error handling
routines are functioning properly.
It is sometimes necessary to analyze and correct a system error while
the monitor is running, either because a system reload does not
correct the error, or the error only becomes apparent while the system
is running. If you encounter a problem that defies analysis using
FILDDT to examine crash files, you can use EDDT to examine and correct
locations in the running monitor. For example, if the system halts or
hangs without dumping or without reloading, or if a problem exists
that does not interfere with timesharing, you can use EDDT to examine
the running monitor. This procedure is described in Chapter 6.
1-5
INTRODUCTION
A Glossary of the acronyms used in this manual is provided at the back
of the manual.
Appendix A contains illustrations of the general layout of monitor
code in virtual address space, for TOPS-10 Version 7.04.
1-6
CHAPTER 2
EXAMINING A CRASH FILE
When the system crashes, the monitor attempts to record information
about the state of the system at the time of the crash. Normally, the
system writes a copy of memory to disk before beginning system reload
operations. This copy of memory is called a crash file, or just "a
crash". You can examine this file using a special version of DDT
called FILDDT. This chapter explains in more detail how the crash
file is created and how to locate the crash file for a particular
crash. The procedure for preparing FILDDT so that you can examine the
crash file is also described, as well as some of the information that
you can obtain immediately by examining the CTY output of stopcode
information.
2.1 CREATING A CRASH FILE
When a stopcode occurs, BOOT automatically creates a crash file of the
contents of memory, called CRASH.EXE, and copies it to the system
crash list. If BOOT cannot dump memory automatically, you can force a
dump by typing the following command on the CTY:
BOOT>str:/D
Use /D to force the crash file to be written. You may include the
name of a file structure (str:).
If this action fails, the CRASH.EXE file on every file structure in
the system crash list may be unprocessed by CRSCPY.
The allocation of CRASH.EXE space is accomplished when you define file
structure information in the ONCE dialog. You can modify the amount
of space reserved for crash files by running the monitor in user mode.
Refer to the TOPS-10 Software Installation Guide for complete
information about ONCE.
To stop the machine when a malfunction occurs, deposit a non-zero
value into physical location 30. The monitor checks this location at
2-1
EXAMINING A CRASH FILE
every clock tick. If it finds a non-zero value, the monitor jumps
into BOOT. You can initiate this procedure using one of the following
commands.
The first example is a command to the PARSER on a KL system. Type
<CTRL/backslash> where you see ^\. In the following examples,
semicolons precede comments that should not be included in your input.
^\ ;invoke the PARSER
PAR>SHUTDOWN ;shut down the system
[Dumping on DSKA:CRASH.EXE[1,4]]
For a KS system, you type the following commands:
^\ ;invoke the console
ENABLED
KS10>SHUTDOWN ;shut down the system
USR MOD
[Dumping on DSKA:CRASH.EXE[1,4]]
If the monitor can reach clock level, this command will start BOOT.
BOOT stops the machine, writes a crash file, and begins automatic
reload procedures. If the monitor has been up less than five minutes,
BOOT starts, but does not initiate the dump and reload action.
Instead, BOOT prints the BOOT> prompt and waits for you to type a
command.
If the SHUTDOWN command is ineffective, you must instruct the monitor
to begin system shutdown procedures. The following commands to the
PARSER accomplish that on a KL system:
^\ ;invoke the PARSER
PAR>SET CONSOLE MAINTENANCE
PAR>HALT
PAR>EXAMINE KL
PAR>JUMP 407
This instructs the monitor to execute the instruction at location 407,
which signals the policy CPU to initiate a system shutdown procedure.
In multiple-processor systems, it may be desirable to initiate system
shutdown procedures on the current CPU instead of the policy CPU. To
accomplish this, jump to location 406 instead, using the following
command:
PAR>JUMP 406
For the KS, you might use the following procedure to force a system
shutdown:
^\ ;invoke the console
ENABLED
KS10>HALT ;halts the system
2-2
EXAMINING A CRASH FILE
KS10>MR ;forces exec mode
KS10>SM ;halts at default location
KS10>ST 407 ;loads BOOT
USR MOD
You should try to use the SHUTDOWN procedure first, because a forced
reload does not save the PC, and there is danger of losing device and
interrupt status information.
After a fatal stopcode or a manual dump operation, BOOT displays the
following information on the CTY:
[Dumping on DSKA:CRASH.EXE[1,4]]
[Loading from DSKA:SYSTEM.EXE[1,4]]
As the second message indicates, BOOT automatically reloads the
monitor. The automatic reload function can be disabled using the OPR
program. This function is useful when debugging the monitor, as
described in Chapter 6.
The CRSCPY program runs when the system is reloaded, to copy the
CRASH.EXE file to a unique file name that will not be superseded by
subsequent CRSCPY runs. If your system did not run CRSCPY when it
reloaded, you must copy the CRASH.EXE file to a safe area manually.
As soon as you can log into the system, save the crash in the
XPN: area of the disk structure by typing the following command:
.R CRSCPY
CRSCPY>COPY
The CRSCPY program copies the file using a unique file name and
reports it when the operation is finished. For more information about
CRSCPY, refer to the TOPS-10 Operator's Guide.
You can use SYSTAT to obtain an overview of the status of the system
at the time of the crash. Use the /X switch to SYSTAT to indicate a
crash file, and include the name of the crash file. For example, to
examine the SYSTAT information for a crash file named SER003.EXE, type
the following command:
.SYSTAT/X XPN:SER003.EXE
The /X switch specifies that the SYSTAT program should read the file
XPN:SER003.EXE (the file name assigned by CRSCPY) instead of the
running monitor.
2.2 USING FILDDT
FILDDT is a system debugging tool designed for debugging files that
are stored on disk. Because FILDDT is a modified version of DDT, you
2-3
EXAMINING A CRASH FILE
must be familiar with DDT before you attempt the procedures described
in the following sections. For more information about DDT, refer to
the TOPS-10 DDT Manual.
FILDDT has all the commands of regular DDT, with one major difference:
commands that control program execution do not work. Those commands
are:
$G Start the program.
$X Execute a single instruction.
$P Proceed with execution.
$B Set breakpoints.
The monitor, because of its large size, runs with local and global
symbols removed. You cannot examine the crash file without these
symbols, so you must load the symbol table of the monitor into memory
with FILDDT and save the modified version of FILDDT. To create this
special monitor-specific FILDDT, follow the procedure explained below.
First, run the standard version of the FILDDT program:
.R FILDDT
File:
You must type the name of the file from which the symbols are to be
loaded. This file must be the runnable monitor; that is, the monitor
before loading (often SYS:SYSTEM.EXE). Include the /S switch to
indicate that symbols are to be loaded.
File:SYS:SYSTEM.EXE/S
The /S switch tells FILDDT to load the symbols for this file. When
FILDDT displays another File: prompt, type <CTRL/Z> to exit from
FILDDT, then type the SAVE command to the monitor with the file name
you choose for the the symbolic FILDDT, to save the runnable file. In
the following example, the symbolic FILDDT is called MONDDT.
File:^Z
.SAVE MONDDT
MONDDT saved
After you save the symbolic FILDDT program, you can use the RUN
command to start the new FILDDT at any time. For example, the
following commands start the symbolic FILDDT and give it the name of a
crash file (XPN:SER003.EXE) to examine:
.RUN MONDDT
File:XPN:SER003.EXE
2-4
EXAMINING A CRASH FILE
When FILDDT reads the crash file, it reports the mapping of the ACs in
the following message:
[Looking at file DSKA:SER003.EXE[10,1]]
[Paging and ACs set up from exec data vector]
The monitor locations saved in the crash file must now be mapped to
the virtual monitor addresses. FILDDT provides special commands for
mapping the monitor and the user address space. Before you issue a
mapping command, FILDDT assumes all locations are physical references.
2.3 ESTABLISHING PROPER MAPPING
Virtual addressing machines require special consideration.
Instructions in programs are loaded into memory by a mapping scheme
based on page maps. The actual physical location of a word in the
monitor will not necessarily be the same as the virtual location.
The symbolic FILDDT contains the virtual address of each location, but
not its physical address. You must map FILDDT memory references
through the Exec Process Table (EPT) to examine monitor locations, or
through the User Process Table (UPT) to examine user locations. To
establish mapping, you must perform the following steps:
1. Find the page numbers of the page maps.
2. Issue the FILDDT mapping instruction (a $nU command).
3. Verify that the mapping is correct.
The following sections describe two methods for mapping the dump and
obtaining preliminary information concerning the state of the
processor at the time of the crash. The instructions used in the
following procedure may be included in a FILDDT command file (also
called a patch file).
To map a crash, you must provide FILDDT with pointers to mapping
tables and other locations in the monitor. The mapping tables and
monitor locations are described in more detail in Chapters 3 and 4.
2.3.1 FILDDT Mapping Instructions
FILDDT allows you to specify the type of address mapping to use in
locating information. You can specify virtual or physical addressing.
The mapping instructions are:
2-5
EXAMINING A CRASH FILE
$U enables virtual addressing. This instruction also sets the
FAKEAC flag, indicating that physical locations 0-17 are to
be interpreted as the user accumulators (ACs).
$$U enables physical addressing. The FAKEAC flag is cleared,
indicating locations 0-17 are interpreted as hardware
registers 0-17.
By default, physical addressing is enabled. FILDDT interprets all
addresses as physical until you issue a virtual mapping instruction.
The mapping is correct only for the data in portions of the monitor's
low segment, because the low segment virtual addresses equal the
physical addresses.
The TOPS-10 monitor uses KL-paging, also called "extended addressing"
(described in Section 3.3). By default, FILDDT is enabled for
KL-paging. If it is necessary to disable KL-paging (for an older
version of the monitor, for example), you can issue the following
command to FILDDT:
0$11U
To enable KL-paging, type the following command:
1$11U
The command n$11U establishes the mapping scheme so that FILDDT will
read the page maps correctly.
Next, you must point FILDDT at the correct page maps that associate
virtual addresses (loaded into the symbolic FILDDT) with the physical
addresses (saved in the crash file), and establish virtual mapping.
2.3.2 Mapping the Crash
To map virtual addresses to physical ones, FILDDT needs the locations
of the Exec Process Table (EPT) and the Special Pages Table (SPT).
The EPT allows FILDDT to map exec virtual memory. The SPT is used to
map the user job that was running at the time of the crash.
On a multiple-processor KL system, the dump contains an EPT for each
CPU in the system. To analyze the dump, you must map FILDDT through
the EPT for the CPU that crashed. A CPU Data Block (CDB) exists for
each CPU in the system. On a single-processor system, there is one
CDB. The CDB contains the address of the EPT. Therefore, you must
first find the CDB for the CPU that crashed. The location DIECDB
contains the pointer to the CDB of the CPU that crashed.
2-6
EXAMINING A CRASH FILE
NOTE
The contents of DIECDB are written when the system
crashes, but not when the system hangs. When you are
analyzing a hung system, the contents of DIECDB (if
nonzero) were written by a previous crash, and
therefore may be invalid.
You can see the contents of DIECDB by typing the following command to
FILDDT:
DIECDB[ 12000
In this example, the physical starting address of the CDB is 12000.
The location of the EPT is stored in the CDB at the offset symbolized
by .CPEPT. Use the following command to open .CPEPT and read its
contents:
$Q+.CPEPT-.CPCDB[ 1000
The first part of the instruction ($Q) refers to the last value
displayed (that is, the contents of the currently open location).
This value is 12000. Starting from location 12000, the pointer moves
to the offset indicated by the difference between the values of .CPEPT
and .CPCDB. The new location is the offset into the CDB of the EPT
address (.CPEPT). The instruction opens the location .CPEPT and
displays its contents. The EPT address is displayed as physical
location 1000.
FILDDT needs the page number for the EPT, not its physical address.
Therefore, you must divide the contents of .CPEPT by 1000.
Submit the result of this division operation to FILDDT using the $0U
command. For example, to calculate the page number and map the EPT,
type the following FILDDT instruction:
$Q'1000$0U
This command divides the previous value (using the $Q command) by 1000
and submits the result to FILDDT as the EPT page number. In this
example, the page number is 1.
Exec virtual memory is mapped after the $0U command. This is
sufficient for examining monitor memory locations in the crash.
However, to examine user data, you must map the current user job. The
FILDDT command n$6U maps the user job and its associated per-process
storage in exec virtual memory (funny space). The value of n is the
page number of the UPT (User Process Table).
The SPT contains a word for the current job running on each CPU in the
system, plus a word for each user job. The right half of each SPT
slot contains the page number of the UPT for the current CPU. When
2-7
EXAMINING A CRASH FILE
extended addressing is enabled, the SPT points to the UPT.
The following FILDDT command sets the SPT base address:
JBTUPM+(job#)-(CPU#)$6U
To map a user job other than the current job on the current CPU, add
the contents of the right half of JBTUPM to the job number, then
submit the result to the $U command.
FILDDT provides temporary registers to contain either hardware
registers or user accumulators. When hardware mapping is established,
FILDDT assumes that locations 0-17 refer to hardware registers 0-17.
However, when you issue a virtual mapping command ($U), the user ACs
can be mapped through the temporary registers. This allows you to
load the user ACs into the temporary registers and then refer to the
user ACs as locations 0-17.
You can use the following FILDDT instruction to map the current AC
block to the temporary registers provided by FILDDT. The instruction
to open and map the current AC block is:
.CPACA[ $Q$5U
This instruction is useful only if the location .CPACA contains the
address of the current AC block. If, however, a UUO at interrupt
level occurs (UIL stopcode), this instruction cannot be used
successfully. Instead, you must determine the location of the current
AC block by defining the interrupt level in progress at the time of
the crash. The AC blocks and interrupt levels are described in more
detail in Chapter 3.
The user job in memory may not match the UPT currently in use at the
time of the crash. You can check the user job that was running by
comparing the contents of offset .CPJOB in the CDB with the contents
of .USJOB in the UPT. If these values do not match, the interrupt
routine was switching UPTs at the time of the crash; use the UPT for
the job number that is in .USJOB.
Look at the code that you are familiar with, in the high segment, to
make sure the dump is mapped correctly. Also check location 410
(ABSTAB), which should point to NUMTAB, which is one of the first
locations in the low segment.
If you set up mapping through the wrong page map, FILDDT returns a
question mark whenever you try to reference an unmapped location. For
example, this could occur if you use the null job's UPT to set
mapping. To reset mapping, use the "$$U" command to set physical
mapping by FILDDT.
2-8
EXAMINING A CRASH FILE
2.4 VERIFYING THE DUMP
Occasionally, your monitor will crash in the process of upgrading to a
new version, or when you are making modifications to the code. In
these cases, it is possible that your crash file will be based on a
different version of the monitor than the monitor-specific FILDDT you
created. You should make sure that the symbols in the
monitor-specific FILDDT match the crash that you are examining. If
values of the symbols do not match, the information in the crash file
may be useless, misleading, or corrupted.
There are several ways to check the symbols. One is to make sure the
version number of the crashed monitor matches that of your current
monitor. Another is to examine addresses in the monitor with known
contents and verify that they contain the right information.
Monitor location CNFDVN contains the monitor version number and edit
number. This version number should match the version number displayed
by the DIRECTORY monitor command.
.DIRECTORY IEZ093.EXE
IEZ093 EXE 8196 <155> dd-mmm-yy 704(33432) DSKB:[10,1]
.RUN MONDDT
File:DSKB:IEZ093.EXE[10,1]
[Looking at file DSKA:SER003.EXE[10,1]]
[Paging and ACs set up from exec data vector]
$$C
CNFDVN/ 70400,,33432
Note that the DIRECTORY command reports version and edit numbers
704(33432), matching the contents of CNFDVN: 704 in the left half,
and 33432 in the right half.
You can obtain the name of the monitor by reading ASCII text starting
at location CONFIG, as shown in the following example:
CONFIG$0T/ RL371A DEC10 Development
In this case, the full system name is "RL371A DEC10 Development".
If these values match, you can be relatively sure that the
monitor-specific FILDDT and crash file match.
2.5 FILDDT COMMAND FILES
FILDDT command files are used to map a dump and obtain preliminary
information that might be relevant to analyzing the crash. A command
2-9
EXAMINING A CRASH FILE
file is a set of FILDDT commands that are executed automatically when
you issue the $Y command to FILDDT. Command files are also used to
edit the runnable monitor (as opposed to making edits to source
modules and rebuilding the monitor).
The FILDDT command $Y invokes a series of FILDDT commands stored in a
file on disk. This allows you to easily execute a set of commands
that you use frequently instead of typing them in. You could use a
command file to map and verify a dump and to extract information you
are likely to need while diagnosing a crash, as described below.
NOTE
The $ (dollar sign) is displayed when you press the
ESCape key in FILDDT. It is used here to show where
you must insert an ESCape character into the file.
Most text editors require a special procedure for
inserting ESCape and other non-printing characters
into a file. You must use the text editor
documentation to find the method for quoting
characters if you do not know how to insert an ESCape
character into a file.
The following command file maps a crash file for a multiple-processor
KL system. The same command file is equally useful on a single-CPU KL
or a KS system. The command file also verifies the correspondence of
the dump with the monitor-specific FILDDT and displays pertinent
system information about the crash.
Comments are included here to describe the functions of the commands,
However, FILDDT will not accept a command file with comments. Your
actual command file should NOT contain the comments in the following
example:
.TYPE VERIFY.DDT ;display contents of patch file
DIECDB[ ;gets addr of CDB for CPU that crashed
$Q+.CPEPT-.CPCDB[ ;gets addr of the EPT
$Q'1000$U ;divides addr by 1000 to get page number
SPTTAB$6U ;sets the SPT base address
.CPACA[$Q$5U ;maps AC references
.CPCPI[ ;gets PI status
.CPPGD[ ;gets DATAI PAG results
.CPSPT[[ ;gets the address of the SPT
.CPDWD[ ;gets CPU's DIE interlock word
.CPCPN[ ;CPU number of crashed CPU
.CPJOB[ ;gets job number of current job
.USJOB[ ;job number in funny space
.CPTCX[ ;process context word on page fails
You can include these and other FILDDT commands in a command file to
obtain initial information about the crash. The locations referenced
in this file are described in Chapter 3.
2-10
EXAMINING A CRASH FILE
The following example shows the types of information that might be
displayed and how to interpret the information. Again, the comments
are included for descriptive reasons, but comments are not allowed in
an actual command file.
.R MONDDT ;run the symbolic FILDDT
File: SYS:CRASH
[Looking at file DSKA:SER003.EXE[10,1]]
[Paging and ACs set up from exec data vector]
$Y ;execute a command file
File: MON.DDT ;command file is MON.DDT
DIECDB[ 13000 ;the address of the CDB for the
;CPU that crashed is 13000
$Q+.CPEPT-.CPCDB[ 3000 ;compute the offset into the CDB
;address of the EPT is stored
$Q'1000$U ;compute the page number of the EPT
;and point FILDDT to the EPT
SPTTAB$6U ;set the SPT base address
.CPACA[ 402077$Q$5U ;map AC references
.CPCPI[ 377 ;377 indicates PI levels are enabled
.CPPGD[ 700100,,2600 ;DATAI PAG shows that:
;current AC block is 0 (exec)
;previous AC block is 1 (user)
;previous context section is 0 (exec)
;UPT page number is 2600
.CPSPT[SPTTAB+1[ 2600 ;shows UPT page number of currently
;mapped job on this CPU
.CPDWD[ 0 ;Die interlock word
.CPCPN[ 1 ;CPU1 failed
.CPJOB[ 5 ;Job 5 was running
.USJOB[ 5 ;Job 5 is mapped on this CPU
.CPTCX[ 701100,,2364 ;Process context information:
;current AC block is 1 (user)
;previous AC block is 1 (user)
;previous context section is 0
;user base page number is 2364
2-11
EXAMINING A CRASH FILE
It is important to compare the value of .CPTCX with the contents of
.CPPGD. The process context word stored in .CPTCX and the DATAI PAG
word stored in .CPPGD are different when the state of the processor at
the time of the crash is indeterminate (for example, for IME or EUE
stopcodes).
2.6 STOPCODE INFORMATION
The following information is useful when the system crashed with a
stopcode. You can determine the stopcode information by looking at
the CTY for the CPU that crashed. The stopcode name is printed on the
CTY, and is stored in location .CnSNM, where n is the CPU number. Use
the Stopcodes Specification in the TOPS-10 Software Notebook Set to
look up the module that generated the stopcode.
The stopcode routines in the monitor also store and print the
following types of information on the CTY:
o Date and time of crash
This information is stored in a series of locations starting
at LOCYER:
LOCYER - Year of the crash
LOCMON - Month of the crash
LOCDAY - Day of the crash
LOCHOR - Hour of the crash
LOCMIN - Minute of the crash
LOCSEC - Second of the crash
Remember to display these locations in decimal, not octal.
o Current job
The word at address .CnJOB holds the job number of the
current job on CPUn.
o PPN of current job
The PPN is stored in the JBTPPN table, indexed by the job
number.
o Program name of current job
The program name is stored in SIXBIT in the JBTNAM table,
indexed by the job number.
2-12
EXAMINING A CRASH FILE
o Terminal of current job
The terminal name is stored in SIXBIT in the first word of
the Terminal DDB, pointed to by TTYTAB (indexed by job
number).
o CPU number
The CPU number of the CPU that crashed is determined from the
value of .CnDWD, where n is the CPU number. Test this symbol
for a negative value (-1) for each CPU in a multiple-CPU
system. A negative value indicates that the CPU did not
crash. If the contents of .CnDWD are equal to zero, the
current CPU is the CPU that crashed.
Refer to Section 5.2 for more information about the types of stopcodes
and the information they provide.
2-13
3-1
CHAPTER 3
LOCATING THE FAILURE
The monitor is the portion of the software that is responsible for
interfacing user programs to hardware. Specifically, the monitor is
responsible for the following functions:
1. Performing tasks for a user before and after running a
program, such as copying or deleting files, finding the
status of the system, and running or stopping programs.
TOPS-10 provides the user interface in the form of the
command language.
2. Executing the program. The user must make requests for all
services (including I/O). The user programming interface is
standardized in the form of monitor calls, also called
Unimplemented User Operators (UUOs).
3. Providing access to the data base. This is done by creating
a logical file system for data stored on disk devices.
4. Controlling CPU usage. A timesharing system must know how to
determine who should get control of the computer. This is
called scheduling.
5. Controlling memory usage. For the system to run efficiently,
jobs must be moved in and out of memory at the right time.
This operation is known as swapping and paging.
6. Controlling access to sharable devices. The main sharable
devices on timesharing systems are disks. Because many jobs
will be using files on the same disk drive, adequate control
must be maintained to prevent destructive interference.
7. Controlling access to single-user (non-sharable) devices.
The monitor must implement a way to allocate these devices to
the right users and control the I/O. TOPS-10 does this with
the GALAXY batch and spooling system.
3-1
LOCATING THE FAILURE
8. Providing error analysis when hardware or software errors
occur (DAEMON and SPEAR).
9. Providing accounting information so the system can be fairly
allocated and users charged for what they use (ACTDAE).
3.1 HARDWARE MAPPING
The hardware uses three types of tables to establish and maintain
mapping of locations in memory for a job: process tables, section
tables, and page tables:
o The process table describes characteristics for a specific
job and includes a pointer to each section map required to
map the job. There are two process tables: the Exec Process
Table (EPT) and User Process Table (UPT).
o The section map contains pointers to the page map for each
virtual section for the monitor or user job.
o The page maps contain locations for each physical and virtual
page allocated to the monitor or user job.
The paging system uses two process tables: the UPT to map the user
job and the EPT to map the monitor. The UPT (User Process Table) is
the table used to describe user address space. Each user job has its
own UPT, which must be loaded before the job can be run. The EPT
(Exec Process Table) is used to describe the monitor address space.
The processor runs by switching between user mode and exec mode. To
perform address translation quickly, the hardware must know the
locations of the process tables. Two registers are used to find the
process tables: the User Base Register (UBR) points to the UPT and
will vary for each job that is loaded into memory. The Exec Base
Register (EBR) points to the EPT. On multiple-CPU systems, each CPU
has an EBR and a UBR at all times.
3.2 PAGING POINTERS
The page maps contain pointers to physical pages of data. The page
maps are read by the microcode, which evaluates two kinds of pointers:
section pointers that point to section maps, and page map pointers
that point to physical pages. Section and page pointers have
identical formats. There are four types of pointers, indicated by a
code stored in Bits 0-2 of the word. The access code is applied to
the address by ANDing Bits 3-6 of all pointers used to evaluate the
address.
3-2
LOCATING THE FAILURE
The pointer to non-accessible pages has code (0) in Bits 0 through 2.
The pointers to accessible pages also include accessibility codes in
Bits 3 through 6. Bit 3 (P), if set, indicates that the page is
public. Bit 4 (W) indicates whether the page is writable, and Bit 6
(C) indicates whether the page can be cached.
Bit 5 of the pointer to an accessible page is used by the MCA25
harware option as the "Keep Me" bit. That is, if Bit 5 is set in the
page pointer, the address translation for that page is not cleared in
the hardware pager, providing that the DATAO PAG (context switch) is
issued with Bit 3 set.
3.3 EXTENDED ADDRESSING
The KL processor uses KL-paging to allow code and data to be grouped
into virtual sections; each section is a maximum of 512 pages of
virtual memory. The monitor layout for a KL with extended addressing
enabled is illustrated in Appendix A.
The KS processor does not support extended addressing. However,
because KL-paging is required in order to run TOPS-10 Version 7.04,
the KS processor simulates KL-paging by choosing an alternate page map
when necessary.
The primary page map for the KS monitor is the Section 0 page map. To
perform a monitor call to an extended section, the KS monitor changes
the page map pointer. For example, to execute the DNET. monitor
call, a special macro reads the Section 2 page map pointer (from
SECTAB+2 in the EPT) and writes the address into the Section 0 page
map pointer (at SECTAB in the EPT). The KS accesses locations in the
Section 2 page map until the monitor call has been serviced. A
similar macro restores the Section 0 page map pointer to SECTAB.
3.4 MONITOR-RESIDENT USER DATA
Some information that pertains to the specific user is kept in the
monitor's address space, in the exec page maps. Each word in a page
map can point to a physical page in memory, but the Section 0 Page Map
also contains indirect pointers to the UPT. The monitor uses these
virtual addresses to reference job-specific locations, such as funny
space.
The job-specific data in monitor address space is composed of the
following areas, which are described separately below.
Funny Space (Per-Process Area)
UPT
3-3
LOCATING THE FAILURE
.UPMAP (Section 0 page map)
.UPMP/.UUPMP (UPT origin)
JOBDAT
Vestigial JOBDAT
The information in these pages is specific to the current user, so the
job's page maps in the crash file contain virtual and physical
addresses. In a multiple-CPU system, the SPT (Special Pages Table)
for that CPU contains the current user page map page. When a new job
is selected to run, only the UBR and the SPT words need to be changed.
Certain pages of the executive virtual address space are designated as
the per-process monitor free core, also known as funny space, for the
job that is currently running on that CPU. This is monitor memory
that is swapped with the job, and contains information pertaining to
its disk DDBs, monitor buffers, SWITCH.INI, the extended channel
table, and so forth.
The monitor references the user's funny space with the symbol .UPMP,
which points to the first location in the UPT, and reads the physical
location in memory from the page table for user page 0.
User page 0 contains JOBDAT locations, which are used by the monitor
for handling the user job.
Vestigial JOBDAT is the job data area for the job's high segment.
3.5 PROGRAM COUNTER WORD
The PC (Program Counter) double-word contains the location of the next
instruction that the system will execute, including flags to indicate
whether the processor is in user mode or exec mode. The PC is stored
in the job's UPT (at USRPC) and in the CDB (at .CnPC). When you
analyze a crash, you must examine Bit 5 of the PC word to determine
whether the processor was in user mode or exec mode at the time of the
crash. If Bit 5 of the PC is set, the processor was in user mode. If
Bit 5 is clear, the crash occurred in exec mode. The remaining PC
flags indicate arithmetic overflow conditions and so forth.
The PC contains a thirty-bit address, which points to the next
instruction to be executed. When control passes to a section other
than the section where the instruction was issued, that instruction
must refer to a 30-bit address. To store the 30-bit PC with flags,
the flag-PC doubleword is used. The flag word contains the PC bits in
Bits 0-12, in a format identical to the single-word PC. Bits 13-17
are unused. The right half of the first word is used by the hardware.
The second word contains the page number and address. Bits 0 through
5 of the second word are zero. The format of the PC doubleword allows
the flags (including the mode bit) to be read in the same manner as a
single-word PC. You can also read the address in a double-word PC in
3-4
LOCATING THE FAILURE
the same way as a single-word PC, after you add 1 to the location of
the PC word.
Most instructions that use 30-bit addresses cannot be issued in
Section 0. Global section references are illegal in Section 0, except
for the OWGBP instruction, the XJRST and XJRSTF instructions, and the
XBLT function of the EXTEND instruction. Any other instructions with
global section references must be made from a non-zero section.
3.6 PROCESSOR MODES
The processor reads the PC to determine whether the instruction is to
be executed in user or exec mode. User mode allows user jobs to run
programs and request the monitor for system resources. Exec mode
allows the monitor to satisfy user requests for system resources and
perform overhead functions.
You can determine the processor mode at the time of the crash by
reading the PC word from the CDB. Bits 5 and 7 of the PC word are
useful in determining the processor mode. If Bit 5 is clear, the
processor was in exec mode. If Bit 5 is set, the processor was in
user mode. In user mode, if Bit 7 is set, the job is in public mode;
if Bit 7 is clear, the job is in concealed mode. In exec mode, if Bit
7 is clear, the process is in kernel mode. If Bit 7 were set in exec
mode, this would establish supervisor mode, but this mode is not used
by TOPS-10.
Processor modes, PCs, and paging pointers are described in the
DECsystem-10/20 Processor Reference Manual.
3.6.1 User Mode
Normally a user program runs in user mode. When the program requests
a monitor service, using a monitor call, the current processor flags
and PC are saved. The program is stopped temporarily while waiting
for the monitor service to be completed; this is called "blocking."
Control of the processor is then passed to the monitor in exec
(kernel) mode by clearing the processor flags and starting at a new
PC.
When an I/O operation is requested or completed, a device interrupt
causes the monitor to service the device. On a regular basis, the
monitor receives a clock interrupt, which initiates job scheduling and
system maintenance (overhead functions). When the clock service
routine is finished, control passes to the appropriate user program,
and the processor switches back to user mode by setting the flag bits
(Bits 5 and 7) and restoring the user's PC.
3-5
LOCATING THE FAILURE
A user program runs in either User Public or User Concealed mode.
User mode begins with a monitor command and ends when the program
exits or encounters an error. Normally the program runs in public
mode: Bits 5 and 7 of the PC word are set. The user program runs in
concealed mode if Bit 7 is clear and Bit 5 is set.
3.6.2 Exec Mode
When a user program requests a service by the monitor, using a monitor
call or a command, the processor must switch from user mode to exec
mode. Exec mode allows the monitor to perform privileged services and
provides the user's interface to file management, device control, and
hardware communication in general.
User programs run in user mode, and cannot perform direct I/O
instructions. A range of I/O instructions, with device codes from 740
to 774, are reserved for customer definition, and are therefore
designated as unrestricted codes.
When a UUO is executed, a hardware trap condition occurs, causing the
the microcode to store the following information in the UPT:
o PC doubleword
o 30-bit effective address
o Opcode and AC (from the instruction)
o Process context word
The new PC word is taken from one of the MUUO dispatch locations in
the UPT, depending on the processor mode and whether or not the UUO
occurred during the processing of another trap condition (a PDL
overfolw, for example). Control passes to the MUUO routine in the
monitor, where UUO processing begins. The monitor uses AC Block 0.
The user program uses AC Block 1. To switch to AC Block 0 from Block
1, the monitor issues the following instruction:
DATAO PAG, addr
Where: addr contains the value [400100,,0].
When the job is not running, the user accumulators are stored in
JOBDAT in the user's address space. The monitor's accumulators are
stored in the next higher locations in the user's address space.
Once in the MUUO routine, the monitor checks the UUO for legality by
checking the instruction stored in .USMUO of the UPT. The return PC
from USRPC in the UPT is placed on the monitor's stack for this job.
Then control passes to the appropriate routine to perform the function
for the user.
3-6
LOCATING THE FAILURE
The execution of the user function may finish or it may block, waiting
for something to happen (I/O, for example), before it can continue.
If control can be returned to the user job, the user AC set is
restored and control passes to the location pointed to by the PC in
USRPC. If the job blocks, the monitor goes to clock level. After the
blocking condition is serviced, the job can run again. At the time of
the block, the monitor's PC is stored at USRPC in the UPT.
The MUUO routine uses a stack, also located in the UPT, which the
monitor can address because it is mapped through a monitor virtual
address (refer to Section 3.3).
Some values in the UPT can be cached without interfering with the
system, such as the stack. These locations are referenced by the
symbol .UUPMP. Other locations are not cached; they are referenced by
the symbol .UPMP, which also points to the first location of the UPT.
On a single-CPU system, the monitor caches the contents of all
locations in the UPT from .UUPMP to .UPMP. On multiple-CPU systems,
however, the system only caches the contents of .UUPMP.
3.7 THE PRIORITY INTERRUPT SYSTEM
In exec mode, the monitor can service the user program, a device
request, or a clock-level interrupt. Interrupts can be caused by
devices or by the clock. While in exec mode, the monitor services
interrupts according to the Priority Interrupt (PI) level assigned to
the interrupting process. A typical set of priority interrupt levels
(also called PI channels) might be:
Level 0 DTE (Byte Xfer,Deposit,Examine only)
CI/NI (limited set of functions only)
Level 1 none
Level 2 DTA (DECtape)
Level 3 Card reader, APR, clock
Level 4 Line printer, magtape, NI, DTE (doorbell)
Level 5 Disk, CI
Level 6 ANF-10 network
Level 7 Monitor
To distinguish the interrupt level of the system at any one time, four
pieces of information are used:
o The set of accumulators currently in use, which reveals the
stack in use.
o The processor mode (exec or user).
3-7
LOCATING THE FAILURE
o The status of the PI system.
o The process context word. When the monitor is called to
perform a service for a user job, as with a command or UUO,
the microcode creates the job's process context word and
writes it into the UPT. This process context word is
displayed by a DATAI PAG instruction where Bit 2 is cleared,
and contains the current AC block number, the previous AC
block number, section bits, and the current UBR (User Base
Register).
A summary of the interrupt levels and how to distinguish them is shown
in the following table:
Table 3-1: Interrupt Level Indicators
______________________________________________________________________
AC
Block PDL Mode PI Status
______________________________________________________________________
User Job 1 Variable User No PIs active
Null Job 1 N/A User No PIs active
UUO Level 0 JOBPDO Exec No PIs active
Clock Level 0 NUnPDL Exec PI 7 active
Device Interrupts:
Terminal driver 2 CnxPD1 Exec PI SCNCHN active
Disk service 3 CnxPD1 Exec PI DSKCHN active
Network service 4 CnxPD1 Exec PI NETCHN active
Other (level y) 0 CnyPD1 Exec PI y active
Page Fail 0 NUnPDL Exec Variable
ERnPDL
______________________________________________________________________
You can find the stack by finding the current set of ACs. The process
context word, stored in the UPT, contains the current AC block.
You can determine the status of the priority interrupt system by
looking at the PI status word, stored at location .CnCPI in the CDB.
This word is read by the monitor with a CONI PI instruction and stored
in the CDB when the monitor starts to process a stopcode. Using this
information you can determine whether the PI system was enabled, what
PI levels were enabled, and what kinds of interrupts were in progress.
3-8
LOCATING THE FAILURE
The PI status word on a KL system has the following format:
Bits Meaning
0-10 Not used.
11-17 Level on which a program requests an interrupt
(Bit 11 = Level 1, Bit 12 = Level 2, and so
forth).
18-20 Write even parity (KL diagnostics only).
21-27 Levels on which an interrupt is in progress.
28 PI system is on.
29-35 Enabled levels.
3.8 THE DEVICE INTERRUPT SERVICE
A device interrupt occurs when an I/O transfer is complete, a device
has changed status, or an error has occurred. There are two types of
device interrupts: vectored and nonvectored interrupts. A
nonvectored, or standard interrupt, is handled by the software. The
interrupt handling instruction is read from the EPT and control passes
to the CONSO skip chain to determine the device that generated the
interrupt. Section 3.81 describes standard, nonvectored interrupts.
The DTEs (doorbell function only), the interval timer (on the same
level as APR interrupts), RH10, and RH20 MASSBUS controllers all
perform vectored interrupts. Vectored interrupts are not dispatched
by the software but are automatically dispatched by the microcode.
Section 3.8.2 describes nonstandard, vectored interrupts.
3.8.1 Standard Interrupts
An interrupt can occur on Levels 1 through 7 only if the PI system is
turned on, there are no higher-level interrupts in progress, and the
PI system is enabled for interrupts on that level on which the
interrupt is requested. If these conditions are met, the interrupt
will stop the processor and turn on a bit in the PI status word. The
bit indicates the level on which the interrupt is requested. The
processor then executes the instruction for handling an interrupt on
the requested PI level.
The location of the interrupt handling instruction is stored in the
EPT. The exact location in the EPT is calculated from the following:
EPT+40+2*n ;where n is the PI interrupt level
The next instruction to execute in the handling of the interrupt is
stored in the EPT and depends on the PI level on which the interrupt
was requested. The above calculation results in an offset into the
3-9
LOCATING THE FAILURE
EPT where the instruction is stored. Thus, if a BA10 (unit record)
I/O bus controller is assigned to PI Level 2, the formula would result
in EPT+40+(2*2). The system then executes the instruction stored at
offset 44 into the EPT.
Interrupt level 0 is reserved for certain types of I/O transfers with
DTE and CI/NI (KLIPA/KLNI) devices. Level 0 bypasses the software and
is handled by the microcode, which handles interrupts on Level 0
automatically without requiring the software to store context
information and so forth.
In general, the interrupt instructions in the EPT are formatted as:
XPCW CHnm
where n is the CPU number (omitted for CPU0), and m is the level
number on which the interrupt is in progress. For example, CH7 means
Level 7 on CPU0. CH27 indicates Level 7 on CPU2 (the third CPU in the
configuration). The new PC flags at CHnm+2 usually include the
previous context user flag. This allows the interrupt service routine
to access the user's address space using the PXCT instruction.
The location following each XPCW in the EPT contains an instruction
that will cause an I/O page fail condition (setting the APR flag),
which will usually result in an IOP stopcode.
Using a data structure known as the CONSO skip chain, the interrupt
routine polls the devices on that interrupt level and services the
interrupt. With the XPCW instruction, control passes to the skip
chain. Each channel has its own skip chain, starting at the address
pointed to by CHnm+3, whose function is to find the specific device
that created the interrupt and then service its needs.
The monitor performs CONSO instructions to decide which device
generated the interrupt. If it finds the interrupting device, control
passes to the interrupt handling routine. If the device is not
requesting an interrupt, the monitor performs a JRST instruction to
the next CONSO instruction. If it reaches the end of the CONSO skip
chain, it dismisses the interrupt with the following instruction:
XJEN CHnm
When control passes to the interrupt handling routine, the monitor
reads the status of the device, using a CONI or DATAI instruction. On
that basis, it may stop the device, advance buffer pointers, or
perform cleanup operations. A CONO or DATAO instruction clears the
device interrupt status. Failure to do so would cause continual loops
in the interrupt handling routine, and eventually the keep-alive count
would expire.
3-10
LOCATING THE FAILURE
The KL processor uses the following instructions to perform I/O:
DATAI CONI
DATAO CONO
BLKI CONSZ
BLKO CONSO
KS I/O processing uses the following set of instructions:
TIOxb
RDIOb
WRIOb
BCIOb
BSIOb
When the interrupt routine is completed, control returns to the
routine that was running before the interrupt (which may be another
device interrupt at a lower PI level). Each interrupt routine has its
own push-down list. The push-down lists are named CnxPD1, where n is
the CPU number (omitted for CPU0), and x is the interrupt level (from
1 to 6).
Device service routines preserve the state of the machine as it
existed before it was interrupted. They can use AC Block 0, as UUO
level does. Accumulators used by the interrupt routine are saved on
the stack before processing, and restored when processing is complete.
The SAVnx routines (n = CPU number, omitted if 0, and x = interrupt
level) are used to save/switch ACs during device interrupts. For
example, SAV1 is the routine to save the ACs for PI Level 1 on CPU 0;
SAV11 is the routine to save the ACs for PI Level 1 on CPU 1.
Certain device interrupt routines have dedicated AC blocks, listed
below:
AC Block Used for
0 Exec-mode
1 User-mode
2 Terminal Scanner Interrupt Service
3 File Interrupt Service
4 Network Interrupt Service
5 Reserved for Realtime Interrupt Service
6 KL-paging Microcode
7 Microcode
Interrupt service routines may also need to use the UPT of a job that
is waiting for the completion of I/O, rather than the current job. In
that case, the UBR and SPT must be modified to point to the correct
UPT, and then switched back when the interrupt is through. The
monitor routines that accomplish this are SVEUF, SVEUB, and SVPCS.
When you are examining a dump, be sure to check the correspondence
between the job and the UPT/SPT.
3-11
LOCATING THE FAILURE
3.8.2 Vectored Interrupts
The KL hardware also uses vectored interrupts, which differ from the
standard, nonvectored interrupts in that the vectored interrupt goes
directly to the interrupt-handling routine, using a different
interrupt location in the EPT. The interval timer, the DTE (doorbell
function only), RH10s, and RH20s may do vectored interrupts.
The DTE interrupts to a location in the EPT, which is calculated as
follows:
EPT+142+10n ;where n is the DTE number (0-3)
For the RH10/RH20 devices, the system has an internal register called
IVIR (Interrupt Vector). When an RH10/RH20 device requests an
interrupt, the EBOX hardware/microcode dispatches to the location in
the EPT calculated as follows:
EPT+contents(IVIR)
This interrupt method allows the disk interrupt to vector for the
standard interrupt location for that channel, providing device
independence in the device interrupt handling routine. Thus, the disk
RH10 or RH20 can load the IVIR with 40+2n and the magtape RH10 or RH20
will dispatch directly into the middle of the skip chain to service a
specific controller.
3.9 TRAPS
Traps differ from interrupts in that they are caused by the execution
of a specific instruction rather than by some asynchronous event.
When a trap occurs, the microcode stores the current PC and flags in
the UPT. A new PC double-word, also in the UPT, specifies where
control will pass and in what mode the processor will operate (exec or
user mode).
3.9.1 Page Fail Traps
When a program attempts to access a page of data that is not
available, the hardware generates a page fail trap. A page fail trap
can occur for one of two reasons: the user tries to reference an
address that cannot be accessed (page not in memory, page
write-locked) or a hardware error (AR/ARX parity error, page table
parity error) occurs. When a page fail trap occurs, the processor
stores information about the trap in location 500 (.USPFW) of the
current UPT. This location is known as the page fail word.
3-12
LOCATING THE FAILURE
The page fail word is formatted differently for a page reference that
is not available and for a hardware error. The page reference to an
address that cannot be accessed has the following format:
+-----------------------------------------------------------------------+
|U|1|Failure Code| |V| | Virtual Address |
+-----------------------------------------------------------------------+
0 1 2----------5 6-7 8 9-------12 13---------------------------------35
In either type of page failure, the virtual address is stored in Bits
13 through 35. Bit 0 is on if the page failure occurred in user
virtual address space. If Bit 0 is off, the failure occurred in
executive virtual address space.
If Bit 1 is on, a hardware-detected error occurred, and the failure
code is stored in Bits 1-5. The failure codes are:
Code Meaning
20 No device response on UNIBUS (KS only)
21 Proprietary violation (KL only)
23 Address break (KL only)
24 Illegal indirect word in EA calc (KL only)
25 Page table parity error (KL only)
27 Section number in EA calc greater than 37 (KL only)
36 AR parity error (KL only)
37 ARX parity error (KL only)
If Bit 1 is off, Bits 2-7 have the following format:
+-----------+
|A|M|S|T|P|C|
+-----------+
2 3 4 5 6 7
Bit Name Meaning
2 A Indicates whether the mapping is valid (0
means a page refill is required).
3 M Indicates that the page has been modified.
4 S Reserved for use by the monitor.
5 T Indicates the type of page reference (0 for
reading, 1 for writing).
6 P Indicates the page is public, if set.
7 C Indicates whether the page is cachable.
3-13
LOCATING THE FAILURE
At the same time the page fail word is stored, the flag-PC doubleword
is stored at .USPFP (location 501) in the UPT and control passes to
the address stored at .USPFP+2 (location 503), which usually contains:
EXP SEILM
Certain error handling routines modify .USPFP+2. If this location
does not contain SEILM, the cause of the crash may have been a failure
in an error recovery routine.
SEILM examines the page fail information stored in the UPT and breaks
down the code to find the specific cause of the problem. The
error-handling routines are described in Chapter 5.
Note that traps cannot be disabled and they can occur during the
service of an interrupt. To return to the correct location, the
Flag-PC doubleword is used.
The page fault trap routine uses AC Block 0 and a push-down list in
the job's UPT.
3.10 CLOCK LEVEL
All functions that must be performed on a periodic basis are done at
clock level, in exec mode. Clock level may be entered in one of the
following ways:
o The clock ticked when the processor was in user mode.
o A UUO could not continue execution (was blocked).
o The null job was running and a new job became runnable.
o A UUO completed and a clock tick occurred previously, during
the processing of the UUO.
A full cycle occurs when the processor enters clock level as the
result of a clock tick; a partial cycle occurs when the processor
enters clock level as the result of a job blocking or the null job
detecting a newly runnable job. The full cycle starts at location
CLKINT; a partial cycle starts at WSCHED or SCDCHK.
A clock tick interrupt occurs at APR interrupt level but is
rescheduled to run at Level 7. The clock tick initiates accounting
and scheduling functions, then generates a PI Level 7 interrupt.
Only the software will generate a Level 7 interrupt. Level 7
interrupts and ANF-10 network interrupts are controlled by the
software. If the scheduler is running, a Level 7 interrupt will not
be processed.
3-14
LOCATING THE FAILURE
During the full cycle, the monitor performs the following tasks:
o User time accounting
o System time accounting
o Processing timing requests
o Checking for hung devices
o Command processing (policy CPU only)
o Choosing a job to run
o Choosing a job to swap
On a partial cycle, the system only performs user time accounting and
then selects a job to run. A software interlock prevents a Level 7
interrupt from interrupting the partial cycle.
The scheduler uses the null job's push-down list, NUnPDL and AC Block
0. When a partial or full cycle has been done, the scheduler prepares
and runs either a user job or the null job.
3.11 ACCUMULATORS AND PUSH-DOWN LISTS
The first step in finding the correct push-down list (or stack) is to
get the right set of accumulators. When a crash occurs, the
accumulators are saved in the following places:
AC Block Location
0 .CnCA0 = .CnCAC = CRSHAC (for CPU0)
1 .CnCA1
2 .CnCA2
3 .CnCA3
4 .CnCA4
6 Portions of .Cn6
7 Portions of .Cn7
The accumulators are stored when stopcode processing starts. The
error processing routines in the monitor use a special stack, ERnPDL.
If this is the current stack, be aware that an error may have occurred
within the error routine. You must do the mapping, or certain stacks
may be inaccessible. Once you have the correct accumulators, the
stack currently in use will become readily apparent. You should check
the stack to make sure the information in it appears to be current.
This information is fundamental to analyzing any crash, and it may
lead directly to the cause of the crash. Often crashes occur because
3-15
LOCATING THE FAILURE
the ACs are misused, the stack is corrupted, or there is confusion in
the Priority Interrupt handling system. Software crashes are not
always the result of oversights in a complicated algorithm. However,
if the crash is due to a more obscure problem, you can use the
information you have gathered so far to begin your investigation of
the state of the software at the time of the crash.
You can continue your investigation of the crash by comparing the
state of the crash with the monitor sources. The following section
lists the more prominent monitor modules and their functions.
3.12 MONITOR ORGANIZATION
Like the hardware, the software is composed of modules. Each module
of the monitor is compiled separately, and then linked with the others
to make up the monitor. A module is a monitor source file with
related routines in it. For example, FILUUO deals with monitor calls
for file access.
The CLOCK1 module controls the following activities:
o Perform system time accounting
o Perform user time accounting
o Initiate terminal command processing (COMCON)
o Initiate scheduling (SCHED1)
o Initiate swapping (SWPSER)
o Perform job context switching
The modules called from UUO level are organized hierarchically. At
the highest level is the UUOCON module, which is responsible for UUO
preprocessing, dispatching to the correct routine, and cleaning up
after the function has been performed. It also contains the code for
some of the UUOs.
For I/O-related UUOs, UUOCON performs device-independent functions
before dispatching to a lower level for the device drivers. The
drivers are responsible for calling the specific modules that issue
the I/O instructions and start the transfers.
Most hardware interrupts enter the CONSO skip chain, which is in
COMMON. From there, control passes to the appropriate low-level I/O
module, or the skip chain may call a routine in the device driver.
Certain types of hardware generate vectored interrupts, which do not
access the skip chain.
3-16
LOCATING THE FAILURE
3.12.1 Monitor Startup Modules
The monitor uses the following modules when it loads and starts the
system, discarding some of them when normal timesharing begins:
o SYSINI initializes devices and the monitor's data base in
preparation for timesharing. It performs system startup,
running an operator dialog to obtain date and time, and
performs device initialization. The monitor reclaims the
memory space used by SYSINI and uses it for dynamic storage.
o ONCMOD holds the routines related to disk units and file
structures. The monitor reclaims the memory for dynamic
storage.
o REFSTR refreshes file structures at startup time. The
monitor reclaims the memory for dynamic storage.
o PATCH contains extra space to patch the monitor during
timesharing. Patch space is reclaimed starting at the
location referenced by PATSIZ, and continues up. SYSINI and
patch space are preserved when the monitor is run with EDDT
loaded.
o AUTCON dynamically configures RH10, RH20, DX10, DX20, CI20,
NIA20, and most I/O bus hardware. The monitor does not
reclaim AUTCON memory space, because reconfiguration might be
required during timesharing.
The following are optional modules that can be omitted from the
monitor during monitor generation:
o CPNSER holds the routines that control the processors in a
Symmetrical MultiProcessing (SMP) system.
o CTXSER performs job context service.
o IPCSER handles the InterProcess Communications Facility
(IPCF).
o LOKCON locks jobs in core.
o PSISER handles the Programmable Software Interrupt (PSI)
service.
o QUESER controls the ENQ/DEQ facility.
o RTTRP allows for real-time programming.
3-17
LOCATING THE FAILURE
3.12.2 Symbol Definition Modules
Some modules contain only symbols that are used by other modules.
They do not appear in the assembled monitor:
o F.MAC contains feature test switches.
o S.MAC contains system symbols.
o DEVPRM contains hardware device related symbols.
o DTEPRM contains DTE20 parameters.
o NETPRM contains network parameters.
o JOBDAT contains user job data area addresses.
o D36PAR contains DECnet parameters.
o SCPAR contains Session Control Parameters (DECnet).
o MACSYM contains DECnet macros.
o KLPPRM contains CI20 parameters.
o SCAPRM contains SCA parameters.
o MSCPAR contains MSCP driver parameters.
o ETHPRM contains Ethernet parameters.
3.13 EXAMPLES OF LOCATING FAILURES
The remainder of this chapter illustrates the crash analysis procedure
for three types of crashes. The examples display the information
gathered with the FILDDT patch file described in Section 2.5.
Comments have been added here to describe the information gained from
each command; in an actual command file, comments are illegal.
Example 1: IME Stopcode (Illegal Memory Reference in Exec Mode)
.RUN MONDDT ;Run the monitor-specific FILDDT
File: IME004 ;Enter crash file name
[Looking at file DSKT:IME004.EXE[30,5653,CAG]]
[Paging and ACs set up from the Exec Data Vector]
diecdb/ CPU0 ;Check that FILDDT found the
;right CDB
3-18
LOCATING THE FAILURE
.cpslf/ CPU0 ;DIE agrees with FILDDT
.cpdwd/ 0 ;This CPU was in DIE
.cppgd[ 700100,,4325 ;Mapping information saved by
;DIE
.cptcx[ 700100,,4325 ;It matches that saved by
;SEILM
.uspfw/ DFDV NTLFRE#(P) ;The page fault word
=113001,,552104 ;A write attempt to 1,,NTLFRE
.USPFP/ CAIA 0 =304000,,0 ;The page fault PC flags
.USPFP+1/ P,,TIC+4 ;an address
$q/ XCT 0(T4) t4/ COMTIV+4 ;at which we find part of
;SCNSER's
1,,COMTIV+4/ MOVEM T1,CRSHWD+3(U) ;typein processing
u[ 1,,552051 _P,,NTLCKC#+4 ;However, U contains an
;apparent PC,
;rather than an LDB address
p/ .UUPMP+616,,NU0PDL+22 ;We are on the clock-level
;stack
1,,NU0PDL+22/ P,,CTICOM#+5 ^ ;The call within SCNSER which
;failed
1,,NU0PDL+21/ ADD 0 ^ ;some saved data
1,,NU0PDL+20/ P,,TTYCM7#+4 ;The return PC from the call
;to SCNSER
ttycm7? ;Where is this label defined?
COMCON ;In COMCON.
;This is part of the TTY
;command.
.cpcml/ P,,NTLCKC#+4 ;COMCON's saved LDB address
;has the same incorrect value
;as AC U.
.cpisf/ .UUPMP+602,,NU0PDL+6 $c ;However, COMCON's saved PDL
;pointer
1,,NU0PDL+6[ 4,,15772 ;points at a likely LDB
;address
$q+ldbclp/ CAIL U,43711 ;And this LDB has a command
;line
;pointer established
.-ldbclp+ldbtit/ CCI 43705 ;So we trace its input chunk
;stream
=1400,,43705 ;(POINT 12,addr,35)
ttchks=20 ;These chunks are 16 words
;long
4,,43705/ UNWNDC,,PLTS5A#+1 $12t ;12-bit ASCII, starting next
;word
4,,43706/ tt
4,,43707/ 21_
4,,43710/ 115
4,,43711/ ec
4,,43712/ ho^@
;"tt 21_115 echo" was the command being executed.
1,,ttycmd/ PUSHJ P,SSEC1 ;We proceed to trace the
3-19
LOCATING THE FAILURE
;execution
1,,TTYCMD+1/ PUSHJ P,SAVE2 ;of the command to see where
;U got
1,,TTYCMD+2/ PUSH P,U ;clobbered.
1,,TTYCMD+3/ MOVE P1,U
1,,TTYCMD+4/ PUSHJ P,CTEXT1
1,,TTYCMD+5/ CAIE T3,JOBVER
=302200,,137 ;"_" is character code 137,
1,,TTYCMD+6/ JRST TTYC0# ;so we skipped this
;instruction,
1,,TTYCMD+7/ PUSHJ P,NTLCKJ ;and executed this code.
;NTLCKJ is called as a result of the NETDBJ macro
ntlckj/ PUSHJ P,NTCHCK ;This routine checks for
;nesting of
1,,NTLCKJ+1/ JRST NTLCKJ+3 ;the NETSER interlock (false)
1,,NTLCKJ+2/ POPJ P,0
1,,NTLCKJ+3/ SKIPE .CPISF ;It then checks for COMCON
;(true)
1,,NTLCKJ+4/ JRST NTLCKC#
1,,NTLCKC#/ PUSHJ P,NTLCKI ;Get the interlock
1,,NTLCKC#+1/ JRST ANFMDL+5 ;(failure branch not taken)
1,,NTLCKC#+2/ POP P,0(P) ;Proceed as a coroutine
1,,NTLCKC#+3/ PUSHJ P,@P(P)
u/ P,,NTLCKC#+4 ;This is the return address
;in U!
;At TTYCMD+2, we pushed U on the stack. We then called a coroutine.
;We should have called NTLCKJ before we pushed U onto the stack.
Example 2: UIL Stopcode (UUO at Interrupt Level)
.RUN MONDDT ;Run the monitor-specific FILDDT
File: uil002 ;Enter crash file name
[Looking at file DSKT:UIL002.EXE[30,5653,CAG]]
[Paging and ACs set up from the Exec Data Vector]
diecdb/ CPU0 ;Check that FILDDT found the
;right CDB
.cpslf/ CPU0 ;DIE agrees with FILDDT
.cpdwd/ 0 ;This CPU was in DIE
.usmuo/ CAIA 0 ;UUO PC flags
.USMUP/ BOOTPA =20 ;The UUO was in the ACs
.USMUE/ MAPBAX+1 =702432 ;UUO effective address
.USUPF/ TLNE T1,4 =603100,,4 ;AC block 3 was current,
$5u/ .CPCA0 ;but FILDDT set up AC block 0,
.cpca3$5u ;so we set up AC block 3 by
;hand.
p/ .UUPMP+623,,C4PD1+23 ;We have an interrupt level
3-20
LOCATING THE FAILURE
;stack
C4PD1+23/ CAIA FREIN5#+5 ;which points to this return PC
FREIN5#+5/ JRST FREIN3# ^
FREIN5#+4/ PUSHJ P,CALMDA# ;We had called this routine to
;notify
CALMDA#/ MOVE T1,0(U) ;the MDA of a new disk unit
CALMDA#+1/ MOVEI T2,0
CALMDA#+2/ PUSHJ P,SNDMDC
CALMDA#+3/ POPJ P,0
CALMDA#+4/ JRST F ;Aha!
;An editing error would seem to be responsible.
;The "JRST F" should be a "JRST CPOPJ1".
Example 3: KAF Stopcode (Keep-Alive Failure)
.RUN MONDDT ;Run the monitor-specific FILDDT
File: KAF003 ;Enter crash file name
[Looking at file DSKT:KAF003.EXE[30,5653,CAG]]
[Paging and ACs set up from the Exec Data Vector]
diecdb/ CPU0 ;Check that FILDDT found the
;right CDB
.cpslf/ CPU0 ;DIE agrees with FILDDT
.cpdwd/ 0 ;This CPU was in DIE
.cppgd[ 700100,,4325 ;Mapping information saved by
;DIE
.cpcpi[ 1,,777 ;CONI PI, result saved by DIE
kafloc/ XPCW @.CPKAF ;Where a KAF STOPCD gets its
;start
;(RSX20F does an XCT of this
;location.)
AP0KAF#
AP0KAF#/ CAIA 0 =304000,,0 ;PC flags
AP0KAF#+1/ P,,LOKNPI $c ;and location
AP0KAF#+2[ 4000,,0 $s ;new PC flags
AP0KAF#+3/ APRKAF ;and location
APRKAF/ MOVEM P,.CPSVP ;Where the real stack pointer
;was saved
/ .UUPMP+603,,NU0PDL+7 ;So we examine it
NU0PDL+7/ WRSLOC,,0 ^
NU0PDL+6/ P,,XMTECH#+17 ^ ;We're inside XMTECH in
;SCNSER
NU0PDL+5/ P,,TTDSC1#+1 ;from the call of XMTCHR in
;TTDINT.
NU0PDL+6/ P,,XMTECH#+17 ;Let's look for a loop in
;XMTECH.
$q/ JRST XMTCH1# ;We're about to restart
;XMTCHR
3-21
LOCATING THE FAILURE
1,,XMTCH1#/ PUSHJ P,LOKSCI
1,,XMTCH1#+1/ SKIPE T1,W(U) ;Check for output state bits
$[ 100,,0 ;We have one,
1,,XMTCH1#+2/ JFFO T1,APCSET+11 ;so this jumps.
1,,APCSET+11/ JRST @XMTDSP#(T2)
1,,XMTDSP#/ SETZ XMTXFP# ;Bit 11 was set,
.+11./ SETZ XMTMIC# ;so we dispatch through this
;location,
1,,XMTMIC#/ MOVE T2,ARSLOC(U) ;getting here.
$[ 430400,,2 ;These are our LDBMIC bits
1,,XMTMIC#+1/ TLNE T2,20 ;(true)
1,,XMTMIC#+2/ SKIPE KAFLOC(U) ;(skipped)
1,,XMTMIC#+3/ JRST MICLG3#
1,,MICLG3#/ PUSHJ P,HPOS ;Get horizontal position
1,,HPOS/ PUSHJ P,SSEC1
1,,HPOS+1/ LDB T2,LDPWID ;Get terminal width setting
$1t/ 10 10 JOBBLT+4(U) ;(POINT 8,addr,35-8)
$[ 2000,,50020 ;from this value
$q'400=4,,120 ;Dropping the low-order 8
;bits reveals
1,,HPOS+2/ ADD T2,JOBERR+1(U) ;a width of ^O120
$/ -120 ;Adding this gives zero
1,,HPOS+3/ POPJ P,0 $
1,,MICLG3#+1/ JUMPN T2,XMTOK# ;(Branch not taken)
1,,MICLG3#+2/ SKIPE T2,ARSLOC(U) ;LDBMIC again
$[ 430400,,2
1,,MICLG3#+3/ TLNN T2,140 ;(false)
1,,MICLG3#+4/ JRST XMTOK1#
1,,XMTOK1#/ TLNE T2,40 ;(true)
1,,XMTOK1#+1/ JRST XMTECH# ;(skipped)
1,,XMTOK1#+2/ SKIPN KAFLOC(U)
$[ 0 ;(false)
1,,XMTOK1#+3/ JRST XMTCH2#
1,,XMTCH2#/ SOSGE T4,BOOTPA(U)
$[ 0 ;(non-skip)
1,,XMTCH2#+1/ JRST ZAPBUF#
1,,ZAPBUF#/ MOVSI T1,DTEDRW#+31
=205100,,200
1,,ZAPBUF#+1/ TDNE T1,W(U)
$[ 100,,0 ;(true)
1,,ZAPBUF#+2/ JRST ZAPPI1# ;(skipped)
1,,ZAPBUF#+3/ SETZM BOOTPA(U)
1,,ZAPBUF#+4/ MOVE T1,F(U)
$[ 1400,,37654
1,,ZAPBUF#+5/ CAME T1,R(U)
$[ 1400,,37654 ;(true)
1,,ZAPBUF#+6/ PUSHJ P,RCDSTP# ;(skipped)
1,,ZAPBUF#+7/ SKIPL SLJOBN#(U)
$[ 0 ;(false)
1,,ZAPBUF#+10/ JRST XMTECH#
1,,XMTECH#/ MOVE T1,JOBBLT+2(U)
$[ 200,,200115
3-22
LOCATING THE FAILURE
1,,XMTECH#+1/ TLNE T1,100000 ;(true)
1,,XMTECH#+2/ JRST ECHCNR# ;(skipped)
1,,XMTECH#+3/ MOVE T1,JOBBLT+3(U)
$[ 10,,400
1,,XMTECH#+4/ TLNN T1,10 ;(true)
1,,XMTECH#+5/ TRZ T1,400 ;(skipped)
1,,XMTECH#+6/ SKIPL WRSINS+1(U)
$[ 0 ;(false)
1,,XMTECH#+7/ TRNE T1,400 ;(false)
1,,XMTECH#+10/ TRNE T1,3000 ;(true)
1,,XMTECH#+11/ TLNE T1,400 ;(skipped)
1,,XMTECH#+12/ CAIA 0
1,,XMTECH#+13/ JRST ECHCNR# ;(skipped)
1,,XMTECH#+14/ HLLZ T1,W(U)
$[ 100,,0
1,,XMTECH#+15/ JUMPE T1,XMTIDL# ;(branch not taken)
1,,XMTECH#+16/ PUSHJ P,UNLSCI
1,,XMTECH#+17/ JRST XMTCH1# ;We're back where we started.
;We have uncovered a loop in XMTCHR processing.
;Comparison with the source shows that this occurs when
;TTY DEFER is set and the line is under MIC control.
;This can be solved by inserting a "TLZ T1,LOLMIC" just before the
;"JUMPE T1,XMTIDL" at XMTECH+15.
3-23
4-1
CHAPTER 4
EXAMINING THE DATA STRUCTURES
After you have isolated the failure in the monitor code, you will need
to interpret the source code to make corrections. You must be able to
read and understand the source code, and compare it to the
instructions in the crash file.
For this purpose, the monitor uses symbols to represent almost all
values: bits, words, offsets, instructions, and more. Symbols make
the code easier to read and modify. This chapter describes the
conventions used in choosing symbolic names, and the tools for finding
the symbols in the source code.
4.1 SYMBOLS
This section describes the types of symbols, how they are named and
where they are stored. There is more information about symbolic
representation and usage in the MACRO Assembler Reference Manual.
The TOPS-10 software is made up of modules, each of which has its own
symbolic definitions. By default, a symbol is defined and used only
in a single module. The same symbolic name can be defined and used
differently by different modules.
A global symbol is available to modules other than the one in which it
is defined. The addresses of shared tables or commonly used
subroutines are examples of symbols defined as global.
4.1.1 Naming Conventions
TOPS-10 uses a consistent scheme for naming and using symbols. This
helps you read and understand the sources. For example, the monitor
accumulator locations have names that are consistent throughout most
of the monitor, and they have the following values:
4-1
EXAMINING THE DATA STRUCTURES
Table 4-1: Monitor Accumulators
______________________________________________________________________
Number Name Description
______________________________________________________________________
0 S Contains the I/O status word from a DDB (DEVIOS)
while the monitor is processing I/O operations.
1 P Contains the push-down list pointer currently in
use.
2 T1 is an unpreserved, temporary AC.
3 T2 is an unpreserved, temporary AC.
4 T3 is an unpreserved, temporary AC.
5 T4 is an unpreserved, temporary AC.
6 W usually contains the pointer to the process data
block (PDB) or the tape controller data block
(KDB).
7 M contains the user virtual address for getting
and putting data during UUO execution. During
command processing, M contains the command
dispatch bits.
10 U contains the Unit Data Block (UDB) address (for
FILSER or TAPSER), or the Line Data Block (LDB)
address in SCNSER.
11 P1 is a preserved AC.
12 P2 is a preserved AC.
13 P3 is a preserved AC.
14 P4 is a preserved AC.
15 J contains the job number, high segment number, or
disk controller data block (KON) address at
interrupt level.
16 F contains the DDB address during I/O. It is used
as a temporary register in non-I/O situations.
17 R is a general-purpose, scratch AC.
______________________________________________________________________
4-2
EXAMINING THE DATA STRUCTURES
The uses for each accumulator may change from one release of the
software to the next. You should always check the source code to see
how the program uses a specific accumulator in a specific situation.
To restore accumulators correctly, several standard subroutine return
sequences have been set up. The main subroutine does a JRST to one of
the following locations:
Subroutine Function
CPOPJ Regular POPJ return
CPOPJ1 Increment return address and then POPJ (skip return)
CPOPJ2 Double skip return
TPOPJ Restore T1 and return
TPOPJ1 Restore T1 and skip return
T2POPJ Restore T2 and return
T2POJ1 Restore T2 and skip return
MPOPJ Restore M and return
FPOPJ Restore F and return
FPOPJ1 Restore F and skip return
WPOPJ Restore W and return
JPOPJ Restore J and return
Symbolic names for locations in the monitor are one to six characters
in length. Usually, all six characters are used. The first three
characters identify the data structure and type of symbol; the last
three describe the unique word or field.
Symbols for data structures usually take one of two forms:
dddxxx
.ddxxx
where ddd or dd represents the data structure and xxx represents the
field or word. Some data structures are:
Symbol Data Structure
.C0xxx CPU data block for CPU0 (in low segment)
.C1xxx CPU data block for CPU1 (low segment)
4-3
EXAMINING THE DATA STRUCTURES
.Cnxxx CPU data block (n = CPU number)
.CPxxx CPU data block for current CPU (high segment)
.PDxxx Process data block
.USxxx User Process Table
.CTxxx Context block offsets
.CXxxx Context saved parameters block offsets
ACCxxx Access table
BAFxxx Bad allocation file block
CHNxxx Channel data block
DEVxxx Device data block
HOMxxx Home blocks
JBTxxx Job tables
JOBxxx Job data area
KDBxxx Common controller data block
KONxxx Disk controller data block
LDBxxx Line data block
NMBxxx File name block
PPBxxx Project programmer number data block
RIBxxx Retrieval information block
SABxxx Storage allocation block
STRxxx File structure data block
TKBxxx Tape controller data block
TTFxxx Forced command table
TUBxxx Magnetic unit data block
UDBxxx Common unit data block
UFBxxx UFD data block
UNIxxx Disk unit data block
Byte pointers referencing fields within these data structures are
named in the following way:
aacbbb
where:
aa represents the first two letters of the three letter name, c
represents one of Y, M, B, P, S, or N, and bbb represents the
name of the pointer
For example, a pointer in the BAF block is named BAYbbb.
Bits within words are usually defined as one of the following:
xx.yyy
xxPyyy
where:
xx is the data structure and yyy is the bit name.
4-4
EXAMINING THE DATA STRUCTURES
Here are some examples:
TO.yyy Bits in CONO TIM,
TI.yyy Bits in CONI TIM,
LI.yyy Bits in CONI/CONO PI,
LP.yyy Bits in CONO/CONI APR,
JS.yyy Bits in JBTSTS (job status word)
4.1.2 Symbol Files and Monitor Generation
Several of the monitor modules contain only symbol definitions. They
are used to define the software features and hardware configuration in
the process of building the monitor.
The first step in generating the monitor is to run the MONGEN program
(MONitor GENerator). It asks a series of questions about the hardware
configuration and the software options to be selected. For more
information about the MONGEN program, refer to the TOPS-10 Software
Installation Guide.
MONGEN creates symbol-definition files that describe the aspects of
the system. After running MONGEN, the system installer can build the
monitor with standard source code libraries, or, if changes have been
made to the sources, the monitor must be built from separate modules.
If the systems programmer does not want to make any changes to the
standard release of TOPS-10, the programmer compiles the common
modules and loads them with a distributed library file of the
remaining monitor modules.
It is common practice, however, to make modifications to the TOPS-10
source code. If changes have been made to one or more TOPS-10 source
modules, the modules of the monitor must be assembled separately to
build a library file.
Next, the MONGEN files must be assembled with the monitor's common
modules, which are:
o COMMOD defines the disk data base.
o COMDEV defines all other devices.
o COMMON describes the CPU, memory, scheduler, job tables, and
so forth.
4.2 READING THE CODE
There are two important sources of information in analyzing system
4-5
EXAMINING THE DATA STRUCTURES
crashes: the crash file and the monitor source code. The key to
successful crash analysis is to be able to compare the crash file and
the source code. Refer to the TOPS-10 MACRO Assembler Reference
Manual for information about the source code and assembler language
conventions.
4.2.1 How to Use a CREF Listing
The listings of the monitor source code should be cross-referenced
(CREF) listings. You will find a CREF listing more useful than
unassembled source code because CREF produces a sequence-numbered
assembly listing, followed by tables showing where symbols are defined
and referenced. To find a symbol in a module, you need only look in
one of these tables, which points to a line number in the assembly
listing. The CREF program is described in the TOPS-10 User Utilities
Manual.
4.2.2 Macros
A macro is a set of frequently used instructions in a sequence that
can be called with a single pseudo-instruction. A macro allows the
system programmer to supply arguments to a single instruction, which
the assembler expands to the desired instruction(s). Macros make it
difficult to read the code, however, unless you understand the purpose
of some commonly-used macros.
Several macros are used to define symbols. These macros are defined
in S.MAC:
o XP (A,B) defines the global symbol A as being equal to B, but
DDT will not display A (A==:B).
o ND (A,B) defines A as a global symbol equal to B using the XP
macro, if A has not already been defined.
There are many other commonly-used macros in the monitor, including:
o $XHGH, $HIGH, $LOW, $CSUBS, and $ABS, which place code in the
extended high segment, high segment, low segment, common
subroutines, and an absolute physical location, respectively.
Code usually goes in the monitor's high segment, which is
write-protected; data goes in the low segment, which is
writable. $ABS is usually used to place data in physical
Page 0 of memory (Words 0-777).
o Ordinarily, an instruction in a user program is executed
entirely in user address space, and an instruction in the
monitor is executed in the executive address space. But to
4-6
EXAMINING THE DATA STRUCTURES
facilitate communication between the monitor and users, the
monitor can execute instructions to refer to locations in the
other address space. This feature is implemented by the
previous context execute (PXCT) instruction. The following
macros allow you to execute PXCT:
1. EXCTUX moves information from the user's address space to
the monitor.
2. EXCTXU moves information from the monitor's address space
to the user's.
3. EXCTUU moves information from one location in the user's
address space to another.
o The USERAC and EXECAC macros generate code to switch between
accumulator blocks. USERAC switches to AC Block 1. EXECAC
switches to the monitor's AC block. If no argument is given,
the switch is made to AC Block 0. If an argument is given,
the AC block specified by the argument is used.
4.2.3 Conditional Assembly
Parts of the monitor are assembled on an optional basis, depending on
conditions defined by an assembler IF statement.
F.MAC has most of the symbol definitions that are used for conditional
assembly. Most symbols are of the form FTxxxx, where FT stands for
Feature Test and xxxx is the specific option. Some of the feature
test symbols and the functions they enable are:
FTKL10 KL10 processor
FTKS10 KS10 processor
FTMP SMP (multiple-processor) system
FTDUAL Dual-ported disks are supported
4.2.4 Finding Symbols
When trying to find a symbol in the monitor, you should follow these
steps:
1. Check the symbol table at the back of the CREF listing you
are currently looking at. If one of the numbers after the
symbol name has a pound sign (#) next to it (as in number#),
the symbol is defined on that line of the code. If the
symbol appears in the CREF listing with no line numbers that
have pound signs, the symbol is global, or it is defined in a
universal file.
4-7
EXAMINING THE DATA STRUCTURES
2. If a symbol is defined in a universal file, check your CREF
listings of S.MAC, DEVPRM.MAC, DTEPRM.MAC, NETPRM.MAC,
MACSYM.MAC, and JOBDAT.MAC. If the symbol is not defined in
any of these modules, the symbol is probably global.
3. If the symbol is not defined in the source module or the
universal files, you must obtain a GLOB listing of the
monitor. The GLOB listing points to the modules where global
symbols are defined and used. Search the symbol tables at
the back of those modules. (GLOB creates listings of global
symbols from binary files. It is described in the TOPS-10
User Utilities Manual.)
4. If you are not successful in searching the listings, run the
monitor-specific FILDDT and use the "symbol?" instruction to
find the module where it is defined. If you type a symbol
name followed by a question mark, FILDDT displays the module
where it is defined.
Monitor parameters used by certain modules are often
associated with global symbols that are defined in those
modules. LINK can detect the parameters that are assigned
different values by different modules. FILDDT lists only one
module where each global symbol is defined, and displays a
"G" next to global symbols. If a symbol is not global,
several modules may be listed as containing the symbol. You
can unlock the local symbols for a certain module by issuing
the following FILDDT command:
module$:
The monitor uses many fixed and dynamic data structures for job
control, for memory management, and for device control. Some of the
data structures that are important for crash analysis are described
briefly in the following sections. For more specific information
about the contents of these data structures, refer to the TOPS-10
Monitor Tables descriptions.
4.3 JOB-RELATED DATA STRUCTURES
Information about a job is kept in the monitor's low segment or in
per-process address space (such as the UPT and JOBDAT). Most of the
following data structures are job tables, and have JBT as the first
three letters of the symbolic name (an exception is TTYTAB). Most job
tables have one entry in the table per job. Some of these tables also
have entries for high segments, because the monitor sometimes treats
high segments like jobs.
The following job tables hold information about the status and
condition of the job:
4-8
EXAMINING THE DATA STRUCTURES
o JBTSTS, JBTST2, and JBTST3 contain the current state of the
job, including the processor queue, execution status,
swapping status, event wait condition, and whether the job is
logged in.
o JBTCQ and JBTCSQ hold the processor queue number, subqueues,
and scheduler class for each job. These tables are organized
as a series of linked lists.
o JBTSWP holds the disk address of the swapped-out job.
The following tables hold the features and options for the job:
o JBTPRV holds the job's privileges.
o JBTSPL holds the spooling bits for the job. These control
how and when requests to spooled devices (LPT, PLT, and so
forth) are handled.
o JBTSCD holds the job's scheduler class.
o JBTWCH controls the WATCH information displayed by the
monitor for the job.
o JBTLIM holds the CPU run-time limit for the job. The monitor
checks this value before processing batch jobs.
The following tables describe the user and the program being run:
o JBTNAM holds the program name.
o JBTPPN holds the project-programmer number.
o JBTLOC holds the ANF-10 node number for remote spooling.
o JBTUPM, a component of the SPT, points to the physical page
of this job's UPT when the job is swapped in.
The following tables are used to point to the location of another
job-related table:
o JBTSGN contains the address of the job's high segment
descriptor blocks.
o JBTPDB holds the address of the job's Process Data Block (the
PDB).
The Process Data Block (PDB) stores more job-related information,
including:
o User name (in SIXBIT)
4-9
EXAMINING THE DATA STRUCTURES
o Accumulated run-time, core and disk usage
o Virtual memory limits
o IPCF information
o Current program name and directory
o The job's search list
o Context flags, quotas, and chain pointers
The words in the PDB are named .PDxxx, where xxx is the specific word.
The remainder of the job-related information is stored with the job
itself in JOBDAT or the UPT. JOBDAT holds the user accumulators when
the job is not running, the starting address of the program, the
addresses of DDT and the symbol table, and other locations required to
run the program.
4.4 CPU DATA STRUCTURES
The CPU Data Block (CDB) contains most of the CPU-specific
information. On a multi-processing system of two or more KL
processors, the monitor maintains a different CDB for each processor.
The CDB is is divided into two sections: one for constant definitions
and the other for variable definitions. The constants area holds such
information as the following:
o CPU number
o Instructions to execute in certain situations, such as device
interrupts
o Bit masks
o Hardware constants
The variables area stores such information as:
o Stopcode information
o Hardware error information
o Performance information
o Frequency of certain events
o Per-CPU patch space
4-10
EXAMINING THE DATA STRUCTURES
The CDB words are named .CPxxx or .Cnxxx, where n is the CPU number
and xxx is the unique symbol for the word. On a single-CPU system,
the .CPxxx format is always valid. In a multi-CPU system, .CPxxx
refers to the current CPU (or, in FILDDT, the CPU that is currently
mapped). To refer to the data on a CPU other than the one you are
currently accessing, use the .Cnxxx formation, replacing n with the
CPU number (0 through 2).
The COMMON module contains the CWRD macro to define constants and
variables in the CPU Data Block (CDB). CWRD is called in the
following way:
CWRD (nam, val, len, lbl)
where:
nam is the word name
val is the optional value to store in this address
(default=0)
len is the optional length of storage area (default=1)
lbl is the optional alternate lable for old-style CPU0
references
For example, the following instruction defines .CnOK as a global
symbol with a value of -1:
CWRD (OK,-1)
For example, the following instruction defines .CnACN as a word in the
CDB variables area, with the alternate name APRSTS:
CWRD (ACN,,1,APRSTS)
The scheduler uses a series of tables to control the use of the CPU.
Some of the scheduler tables are:
o QBITS determines how the scheduler should move a job from one
wait state to another.
o SSCAN and SQSCAN tell the scheduler the order and direction
the run queues should be scanned to find a runnable job.
o Transfer tables control the destination queue for requeued
jobs.
The AVALTB table contains flags to indicate whether a sharable
resource has become available. A sharable resource is a portion of
the monitor that can only be used by one process at a time.
4-11
EXAMINING THE DATA STRUCTURES
Some of the sharable resources are:
Name Resource
AU Alter UFD (one per UFD, per structure)
CX PDB/context block interlock word (one per job)
DA Allocate disk space (one per disk unit)
EV Use executive virtual memory
MM Memory management (for modifying the data base)
REQTAB contains the number of jobs waiting for each resource. A value
of -1 in REQTAB indicates that the resource is available; a value of
zero means that a job has the resource and no other job is waiting.
INTTAB describes each hardware interrupt routine. Each two-word entry
contains the PI level, the address of the DDB (or prototype DDB), and
the CPU to which the device is connected.
4.5 MEMORY DATA STRUCTURES
The monitor uses PAGTAB and PT2TAB to allocate user and monitor memory
space (usually referred to as "core"). The tables contain one word
for each page of physical memory. A job's allocation of pages is
maintained as a forward linked list using PAGTAB, and as a backward
linked list with PT2TAB. All the pages for a job are linked using the
right half of a PAGTAB and PT2TAB entry. PAGPTR contains the starting
address for the linked list of free pages. The left half of the
PAGTAB and PT2TAB entries contain bits describing how the page is
used: whether it is locked, locked in executive virtual memory, and
so forth. The monitor uses PT2TAB to obtain information about
swapped-out pages.
MEMTAB also has one entry for each page in memory. The monitor uses
MEMTAB during swapping and paging requests, to keep track of where
pages are stored in the swapping area and which page to transmit next.
The monitor also maintains areas of dynamic storage called free core,
allocated in four-word chunks, using a bit table to determine which
chunks are in use and which are not.
4.6 COMMAND PROCESSING TABLES
The command processor uses several tables to verify and control
monitor commands, including COMTAB, DISP, and UNQTAB. COMTB2, DISP2,
and UNQTB2 are used to describe SET commands. COMTBC, DISPC, and
UNQTBC are for customer use.
TTFCOM is the forced commands table. This table is used if the
monitor determines that a job must execute a command immediately,
regardless of the job's current state. The monitor does not place the
4-12
EXAMINING THE DATA STRUCTURES
commands in the TTFCOM table into a terminal input buffer before
processing the command.
4.7 UUO PROCESSING TABLES
UUOTAB contains the addresses of the operator-dependent UUO routines.
The addresses are arranged in order of UUO opcode, with one halfword
devoted to each address. The UUO handler verifies whether the UUO is
valid and dispatches to the address stored in UUOTAB. If the UUO is
illegal, control passes to an error routine called UUOERR.
The tables UCLJMP and UCLTAB are used for the CALL and CALLI UUOs.
UCLTAB contains the names for the CALL UUOs; UCLJMP contains the
addresses of the CALL/CALLI routines.
4.8 I/O DATA STRUCTURES
The most dynamic and interrelated data structures in the monitor are
those related to I/O. The data structures that are common to almost
all I/O operations are the Job Device Assignment table (JDA), the
device data block (DDB), and user I/O buffers. Other data structures
exist to control specific types of hardware: disk or tape units,
device controllers, or software I/O channels. For certain devices
(such as disk), an extra level of organization is imposed: the
logical file structure, requiring additional data structures.
4.9 THE JOB DEVICE ASSIGNMENT TABLE
The Job Device Assignment table (starting at USRJDA in the UPT) holds
the addresses of the DDBs currently in use by the job. It is indexed
by the software channel number. When the user issues a UUO to
initiate I/O, a software channel number must be supplied, which is
associated with the device or file to be accessed. More channels are
available in the extended channel table, stored in funny space.
Extended channel table entries are in the same format as the JDA
table. The contents of .USCTA in the UPT point to the extended
channel table.
The left half of the JDA entry for a channel contains status bits that
indicate which UUOs have been successfully completed for this channel.
Following are some of the status bits, which are defined in S.MAC:
Bit Symbol Meaning
0 INITB An OPEN or INIT has been done on this channel.
1 IBUFB INIT specifying input buffers was done.
4-13
EXAMINING THE DATA STRUCTURES
2 OBUFB INIT specifying output buffers was done.
3 LOOKB LOOKUP was done.
4 ENTRB ENTER was done.
5 INPB INPUT was done.
6 OUTPB OUTPUT was done.
7 ICLOSB CLOSE (input side of channel) was done.
8 OCLOSB CLOSE (output side of channel) was done.
9 INBFB INBUF was done.
10 OUTBFB OUTBUF was done.
11 SYSDEV System device, or [1,4] for disk area.
12 RENMB RENAME UUO in progress.
13 RESETB RESET UUO in progress.
4.10 THE DEVICE DATA BLOCK
The monitor uses the Device Data Block (DDB) to control each device.
The information in the DDB comes from a monitor call and is read by
the interrupt handling routine to perform the I/O. The handler
records the status of the operation in the DDB. The monitor and the
user can read the status of the I/O operation from the DDB. For
example, the monitor can detect a hung condition by checking a timer
in the DDB.
User programs can include the same instructions to perform I/O with
disk devices, magnetic tapes, and line printers, because the format of
the DDB is similar for all devices. The monitor handles the devices
differently by handling the DDBs differently and by ignoring any
information in the DDB that is not relevant to the specific device.
For example, the monitor creates DDBs for single-user devices when the
system comes up; these DDBs are never deleted. The monitor simply
updates the information in the data block. For sharable devices, such
as disk devices, the monitor creates DDBs dynamically in the user's
funny space, when a channel is opened. The DDB for the channel is
deleted when the channel is closed. Spooled devices, such as line
printers, are handled in a similar manner.
A device on an ANF-10 network front-end requires a special kind of
DDB, because remote stations can have line printers or card readers.
When a user first accesses the remote device, NETSER creates a DDB for
the device. COMDEV contains the prototype network DDB.
NETDEV contains the I/O routines for specific network devices. For
example, the RDXSER routine, in NETDEV, handles RDA devices, and the
TSKSER routine handles intertask communication.
DTESER contains the DTE device handling routine for DECnet front-ends
(DN20s running MCB software). The DTE DDB is dynamically created for
the purpose of loading and dumping the front-end memory.
4-14
EXAMINING THE DATA STRUCTURES
All DDBs include the following locations:
o DEVNAM contains the SIXBIT device name.
o DEVBUF contains the addresses of the user buffers.
o DEVMOD describes the type of device.
o DEVIOS is the I/O status word.
o DEVSER contains a pointer to the next DDB and the address of
the dispatch table.
Most devices are configured dynamically by the monitor. A prototype
DDB exists for each type of device. When a recognized hardware device
is detected by the monitor, a DDB is created and the contents of the
prototype DDB are copied into the new DDB. Then, specific information
(device names, unit numbers, and so forth) are filled in. Prototype
DDBs are linked into the DEVLST chain. They may also by found by
indexing into DDBTAB using the .TYxxx value for the device in
question. For example, .TYMTA has a value of 2. DDBTAB+2 contains
the address of the prototype magtape DDB.
Device Module DDB Hardware Interface
Card reader CDRSER CR1DDB CR10 I/O BUS
DCRSER DCRDDB CD20/RSX-20F
Card punch CDPSER CDPDDB CP10/CP10D I/O BUS
Line Printer DLPSER DLPDDB LP20/RSX-20F
LP2SER LP2DDB LP20/UNIBUS (KS10 only)
LPTSER LPTDDB BA10/LP100 I/O BUS
Magtape TAPUUO TDVDDB All interfaces
Plotter PLTSER PLTDDB XY10 I/O BUS
Paper tape reader PTRSER PTRDDB CR04 I/O BUS
Paper tape punch PTPSER PTPDDB CR04 I/O BUS
4.11 FINDING DDB INFORMATION
The following example shows how to look at a crash file to find the
DDBs and other information about I/O. In this example, Job 7 was
running LPTSPL. You must first issue the mapping command ($6U), to
map the UPT through Job 7, rather than through the UPT for the job
that was currently running. A typical command sequence might be:
JBTNAM 7$6T/ LPTSPL
JBTUPM 7[ 42000,,152 .-n$6U
where n is the CPU number of the CPU that is currently mapped.
4-15
EXAMINING THE DATA STRUCTURES
The commands to look at the user job device assignment table are:
USRJDA[ 506000,,65334 ;Channel 0
.UPMP+652[ 506000,,65414 ;Channel 1
.UPMP+653 0 ;Channel 2
.UPMP+654 0 .
.UPMP+655 0 .
.UPMP+656 0 .
.UPMP+657 0
.UPMP+660 0
.UPMP+661 0
.UPMP+662 0
.UPMP+663 0
.UPMP+664 0
.UPMP+665 0
.UPMP+666 0
.UPMP+667 0
.UPMP+670 0 ;Channel 17#(octal)
The commands to display the devices associated with the DDBs are:
$6T 65334/ LPT0
65414/ LPT1
Both devices are printers, controlled by LPTSPL.
The left half of each JDA entry contains bits indicating the UUOs
executed for that channel. The left half of the JDA entry shown above
contains 506000, which indicates Bits 0, 2, 6, and 7 turned on. These
bits are set for the following UUOs:
Bit 0 OPEN/INIT
Bit 2 OUTBUF
Bit 6 OUTPUT
Bit 7 CLOSE (input side, as input is not allowed in LPTs)
The user buffers are the next source of information. Find the output
buffer for LPT261 by examining the left half of the DEVBUF word in the
DDB, which holds the address of the output ring header:
65414+DEVBUF/ 45150,,0 ;output-header,,input-header
The user buffers are always in user address space. To examine
locations in user address space, switch mapping to the user job.
JBTUPM shows that the UPT starts at 152; therefore, the command to
switch mapping to user space is:
152$1U
Now you can examine the contents of the output ring header:
45150/ 44351 ;Current buffer addr+1
45151/ 10700,,0 ;Byte pointer
45152/ -1 ;Byte count
4-16
EXAMINING THE DATA STRUCTURES
Location 45150 contains the address of the second word of the current
buffer, which contains the address of the next buffer in the buffer
ring, and so forth. You can locate all the buffers in the ring using
the same method:
44351/ 176,,44551 ;Buffer 1
44551/ 176,,44751 ;Buffer 2
44751/ 176,,44151 ;Buffer 3
44151/ 176,,44351 ;Buffer 4
Therefore, there are four buffers set up. The right half of the
header word points to the next buffer in the ring. The left half
holds the use bit and the buffer size. Bit 0 is the use bit (BF.IOU),
and its setting indicates the following state in the following types
of buffers:
Buffer Empty Buffer Full
Input Buffer 0 1
Output Buffer 1 0
In the left half of the header words listed above, Bit 0 is off,
indicating that the output buffers were full. The remainder of the
left half holds the buffer size, in this case, 176 (octal) words.
To read the contents of the first buffer, use the following commands:
$$7T
44151/ @pHt@
44152/ }
44153$0T/ GLE File format:ASCII Print mode:ASCII /DELETE _^L
GGGGGGGGGGGG RRRRRRRRRRRR IIIIIIIII PPPPPPPPPPPP EEE
EEEEEEEEEEEEEEEE ...
The rest of the buffer contains the banner page printed by LPTSPL
immediately before printing a file. LPTSPL had just begun printing a
file when the system crashed.
Job 7 is using two DDBs, but it is also important to check the
extended channel table for the job. In this case, it reveals more
DDBs. Note that the left half of the pointer to the extended channel
table does NOT contain a section number, as might seem immediately
apparent. Only the right half of this word is a valid pointer to
data:
.UPMP+USCTA[ 21,,341200
341200[ 651500,,340000 ;Channel 20
341201[ 651400,,340063 ;Channel 21
341202[ 651400,,340146 ;Channel 22
4-17
EXAMINING THE DATA STRUCTURES
These DDBs are in funny space, so they are disk DDBs. They contain
the following file names: SYS:LPFORM.INI[1,4], DSKC:ERROR.FS[6,6],
and DSKC:GRIPE.SRJ[1,2]. The DDBs are displayed as follows:
340000/ SYS
340000 DEVNAM/ LPFORM
340000 DEVEXT/ INI (
340000 DEVPPN[ 1,,4
340063/ DSKC
340063 DEVNAM/ ERROR
340063 DEVEXT/ FS A
340063 DEVPPN[ 6,,6
340146/ DSKC
340146 DEVNAM/ GRIPE
340146 DEVEXT SRJ
340146 DEVPPN[ 1,,2
Because the banner page that was being printed has the file name
GRIPE, it is clear that the third disk DDB is associated with the file
that was being printed at the time of the crash.
4.12 LINE DATA BLOCKS (LDBS)
The monitor uses terminals in two different ways: they are the means
to enter commands directly to the monitor, and they are also subject
to control by user programs. To serve both functions, there are two
data structures: the terminal DDB and the Line Data Block (LDB).
LDBs contain information about a terminal line. There is one LDB for
each terminal and it is built when the monitor is initialized. LDBs
are not created dynamically; they continue to exist as long as the
system is in operation. This allows users to type commands on
terminals even though they are not logged in, and permanent LDBs speed
response because the monitor does not have to spend the time
allocating an LDB. The code to allocate and initialize the LDBs is in
SCNSER, and it is discarded when system initialization is complete.
In general, an LDB contains:
o Pointers to input and output chunks (terminal I/O buffers)
o Counts of how many characters are currently in the chunks
o Pointer to its associated DDB
o Line status bits
o Line characteristic bits
4-18
EXAMINING THE DATA STRUCTURES
o Position counter
o MIC information
o Break characters
o Count of characters to echo
You can use LINTAB to locate the LDB entry for a terminal line.
LINTAB contains one entry for each terminal in the system (including
CTYs and PTYs). Use the TTY number as the offset into LINTAB. The
LINTAB entry (a fullword global address) points to the LDB, and the
first word of the LDB points to the terminal DDB (if the terminal DDB
exists).
4.13 THE SCNSER DATA BASE
SCNSER processes user input and calls the appropriate module to handle
the I/O. The SCNSER data base is composed of the following virtual
memory sections:
Data Memory Section Used for
LINTAB Section 0 Translates line no. to LDB addr
DSCTAB Section 0 Translates modem no. to line no.
DDB pool Section 0 TTY device data blocks
LDBs Section 4 Line data blocks
Chunk pool Section 4 Buffers
4.14 TERMINAL CHUNKS
Terminal data is usually stored in eight-word buffers called TTY
chunks. In 12-bit ASCII mode, the terminal chunk size varies.
Examine the value of TTCHKS to see the current size of a terminal
chunk. The terminal chunk starts with a pointer to the previous
chunk, and a pointer to the next chunk, followed by the character
data.
Chunks are maintained as doubly linked lists, using halfword links
relative to Section 4. Each terminal line can potentially have four
linked lists of chunks: one for input, one for output, a list for
filler characters, and a list for out-of-band characters. When chunks
are no longer needed by a terminal line, they are returned to a free
list of chunks. The LDB contains pointers to the chunks.
Each character in a chunk is stored as a 12-bit byte, permitting a
maximum of 21 characters to be stored in a chunk (3 to a word). In
reading the characters in terminal chunks using FILDDT, use the $12T
4-19
EXAMINING THE DATA STRUCTURES
command to break up the 36-bit word into 12-bit bytes (4 bits for
flags + 8 bits for data).
The monitor keeps all the chunks in a pool. The TTYINI routine, in
SCNSER, initializes the chunks, allocating space for them and creating
the links.
The location TTFTAK points to the first free chunk in the pool. When
a terminal needs a chunk, it gets the chunk pointed to by this
location. TTFPUT points to the last free chunk in the list and
returned chunks are stored after this chunk. TTFREN contains the
number of free chunks in the system. The following macros place
characters in the chunks and remove characters from the chunks:
LDCHK, LDCHKR, and STCHK. The following macros are useful in terminal
handling. However, these macros should not be called when SCNSER
interrupts are enabled.
o LDCHK takes a character out of a chunk, and does not give
back used chunks (useful when echoing input).
o LDCHKR takes a character out of a chunk and returns used
chunks to the pool, if necessary.
o STCHK puts a character in a chunk, allocating chunks from the
pool, if necessary.
4.15 TERMINAL DEVICE DATA BLOCKS
Terminal device data blocks are allocated from the TTY DDB pool as
jobs are created, or as the terminal is assigned by a job on another
terminal. Some types of information that are stored in the terminal
DDB are:
o Pointers to user buffers
o Device and logical names for the terminal
o I/O status information (DEVSTA)
o Device mode information (DEVMOD)
o CPU number of the CPU that owns this terminal
o Pointer to the LDB
Every job has a terminal DDB for its controlling terminal, whether the
job is attached or not. Terminal DDBs are created when a job number
is assigned (that is, when a program is run) and when a terminal is
assigned or OPENed by another job. If the job is not logged in when
the program finishes, the DDB is deleted. If the job is logged in,
the DDB remains until the job logs out or detaches.
4-20
EXAMINING THE DATA STRUCTURES
TTYTAB is a table in COMMON that has one entry per job and points to
the DDB of the controlling (attached) terminal of the job. If a
program opens a software channel for a terminal, an entry is made in
the channel table for the terminal.
LDBs and DDBs are linked when a job is created or a terminal is
attached to a job. These links are destroyed when:
o You log out or detach your job.
o A node goes down when the terminal is connected.
o You hang up the modem of a terminal that is connected.
o You release a terminal on a software channel.
TTYATI attaches the terminal to the job when the job is created;
TTYATT attaches the terminal for the ATTACH command.
4.16 FINDING TERMINAL I/O INFORMATION
The following example shows how to extract information from the
terminal chunks for a job. In this case, you are examining Job 17,
which is running PIP. First, look at TTYTAB, which points to the
terminal DDB for the job:
TTYTAB+21[ 102206
102206$6T/ TTY124
As the first word of the block verifies, it is a terminal DDB. Next,
find the LDB by looking at the DDBLDB word:
102206+DDBLDB[ 4,,450430
4,,450430[ 102206
The DDB pointer in the first word of the LDB is correct. Next,
examine the LDB:
4,,450431[ 0
4,,450432[ 100000,,0
4,,450433[ 10000,,0
4,,450434[ 0
4,,450435[ 0
4,,450436[ 0
4,,450437[ 1400,,426522
4,,450440[ 1400,,426522
4-21
EXAMINING THE DATA STRUCTURES
4,,450441[ 0
4,,450442[ 0
4,,450443[ 301400,,422450 ;Ptr to put output characters
4,,450444[ 301400,,430276 ;Ptr to take output characters
4,,450445[ 2137 ;No. of characters in output
The pointers are PDP-10 byte pointers. The memory address in the
right half points to the terminal chunk, which can be displayed by:
4,,430276$12T/ <space><space><
The pointer is in the middle of the chunk. Determine the chunk size,
in order to know where the chunks begin and end:
TTCHKS=10
Now, start from a few locations back, and you can see:
4,,430275/ 10 ^
4,,430274/ ^
4,,430273/ MEM ^
4,,430272/ DT ^
4,,430271/ ^@!^Q =417221
The contents of location 4,,430271 are a backward pointer in the left
half, and the location of the next chunk in the right half. The chunk
itself holds the text "DT MEM 10 <***>."
By examining the next chunks, you can deduce the entire message:
DT MEM 10 <***> 666405 9-Jul-80
BOOT11 DOC 10 <***> 157023 27-Jul-79 4A(46)
BOOT11 EXE 28 <***> 411354 26-Jul-82 4A(46)
BOOT11 HLP 2 <***> 500576 5-Jan-75
BOOT11 MAC 108 <***> 010501 27-Jul-79
BOOT11 MEM 29 <***> 544353 27-Jul-79
BOOTS DOC 35 <***> 352703 17-Jul-79
BOOTS EXB 10 <***> 556224 26-Jul-82
BOOTS MAC 92 <***> 764007 31-Jul-79
BT128K EXB 10 <***> 605464 26-Jul-82
BT256K EXB 10 <***> 556224 26-Jul-82
WIBOOT EXE 32 <***> 607553 30-Nov-79 7(12)
WLBOOT EXE 32 <***> 325717 30-Nov-79 7(12)
WSBOOT EXE 24 <***> 631454 30-Nov-79 7(12)
WTBOOT DOC 18 <***> 451662 28-Jun-79
WTBOOT MAC 29 <***> 007472 20-Jul-79
DML6A DOC 3 <***> 331675 7-Mar-79
DMPFIL EXE 16 <***> 071372 16-Jul-80 6A(7)
DMPFIL MAC 34 <***> 661675 7-Mar-79
DMPFIL MEM 5 <***> 077054 8-Mar-79
COPY EXE 8 <***> 605250 17-Jul-80 7(101)
4-22
EXAMINING THE DATA STRUCTURES
CPY007 DOC 4 <***> 507510 8-Mar-79
DTC007 DOC 3 <***> 204110 8-Mar-79
DTCOPY EXE 20 <***> 456574 17-Jul-80 7(101)
DTCOPY MAC 43 <***> 303311
The user was reading a BACKUP tape directory listing when the system
crashed.
4.17 TAPE DRIVES
The data structures for tape drives parallel the actual hardware
components. Depending upon the hardware interface, a magtape
controller may be connected to as many as 15 drives. The software has
up to 15 tape unit data blocks (TUBs) connected to a tape controller
data block (KDB), which then points to a channel data block (CHN).
There is one TUB for each tape unit in the system. It contains the
unit name, pointers to the DDB and controller, error counts, tape
label information, and a pointer to the IORB (I/O request block, the
request to the controller outlining the I/O transfer). The first word
in each TUB is the SIXBIT name of the tape unit, in the form:
MTxy
where x is the controller name and y is the unit number. For example:
MTA0
The prototype TUBs are:
Symbol Units
DX1UDB DX10/TX01/TX02
T78UDB TM78
TCXUDB TC10C
TM2UDB TM02/TM03
TMXUDB TM10B
TS1UDB SA10/TX01/TX02
TX2UDB DX20/TX02
The KDB identifies a controller and there is one for each tape
controller in the system. It holds the name of the controller, a
pointer to the next KDB, the channel command list, a list of TUBs
owned by the controller, and controller-dependent information. In the
monitor, KDBs are pointed to by KDBTAB+.TYxxx. The name of the
controller is stored in the first word as MTn, where n is the
controller number. The KDB also points to the channel it is connected
to.
4-23
EXAMINING THE DATA STRUCTURES
The prototype KDBs are:
DX1KDB
T78KDB
TCXKDB
TM2KDB
TMXKDB
TS1KDB
TX2KDB
Channel data blocks exist for channels that are connected to any type
of controller. They hold enough information to start and monitor the
channel transfer, including:
o Error counts
o Retry information
o Channel status
o Channel queue
At system startup, AUTCON creates one magtape DDB for each unit on
each controller. The start of a magtape DDB can be obtained from
DDBTAB+.TYMTA. The magtape DDB is named:
MTxu
where:
x is the alphabetic controller name (A for controller 0, B for
controller 1, and so forth)
u is the unit number
A special magtape DDB (called a Label DDB) is required for the tape
label processor (PULSAR). This is needed so I/O can be performed by
two different jobs (the user job and the job running PULSAR), while
the device remains assigned to the user job. The label information is
stored in the Tape Unit Data Block (TUB), which is common to both the
magtape and the label DDB.
The name of a label DDB is in the form:
'Lxu
The values of x and u are the same as shown above for the magtape DDB.
The label DDB has the same format as a magtape DDB.
4.18 DISKS
Disks are the most complex peripheral I/O devices in a timesharing
system. They are shared among jobs, using a logically structured file
4-24
EXAMINING THE DATA STRUCTURES
system to store data and prevent destructive interference. The basic
unit of disk storage is one block (equal to 128 words).
TOPS-10 organizes information into logical groups known as files. The
contents of a file are referenced by the file specification, which
uniquely identifies the file. A file specification has four
components:
o A file structure name, which identifies the disk drive or
group of disk drives where the file is stored
o An ordered list of directory names (MFD, UFD, and SFDs, if
any)
o A file name of one to six alphanumeric characters
o A file extension of zero to three alphanumeric characters
A file structure is a logical device name that refers to one or more
physical disk units. Using the file structure name, the user job need
never know the exact physical unit where data is stored.
The directory where a file is stored helps to uniquely identify the
file. TOPS-10 organizes files by using file structures, User File
Directories (UFDs), and Sub-File Directories (SFDs). A UFD or SFD is
itself a file, and contains a list of all files for a user, and a
pointer for accessing those files.
The Master File Directory (MFD) points to all the UFDs on a particular
disk file structure. There is one MFD for each file structure,
containing the names and addresses of all the UFDs on that structure.
Each UFD can optionally contain Sub-File Directories (SFDs). An SFD
is a logical group of files within the UFD. SFDs can contain their
own sub-file directories, which can be nested to a level of five SFDs
in a single UFD.
The UFD is named with the user's PPN, in brackets. For example, the
user with PPN 10,507 has the following UFD:
[10,507]
You specify an SFD by typing the name of the UFD, followed by the name
of the SFD (up to six alphanumeric characters). For example, the UFD
[10,507] could contain a file called FIRST.SFD. To access the files
in this SFD, the user specifies the following directory:
[10,507,FIRST]
In the SFD, the user keeps a file called SECOND.SFD, which points to a
nested SFD. To access files in the nested SFD, the user types the
following directory name:
[10,507,FIRST,SECOND]
4-25
EXAMINING THE DATA STRUCTURES
The monitor does not write the data on disk in physically consecutive
disk blocks. The monitor must allocate disk space effectively in a
dynamic situation where users are constantly creating, deleting,
modifying, and appending to variable-length files. Therefore, the
monitor segments disk space into blocks and stores files in space that
is available throughout the file structure.
To maintain this complex storage system, the monitor must maintain
some amount of overhead data for retrieving files and allocating disk
space. The RIB (Retrieval Information Block) contains the retrieval
information for the file.
A RIB is a block on the disk that contains retrieval pointers to the
blocks making up the entire file. The UFD points to the first RIB for
each file. Each retrieval pointer in the RIB describes a contiguous
block of data called a "group." The retrieval pointer contains the
first physical disk address of the group and the number of blocks that
are in the group. UFDs and MFDs also have RIBs to describe their
locations on the disk unit.
A retrieval pointer contains the following information:
o The number of clusters in this group
o The cluster number where the group starts
o The checksum for the group
One of the following conditions is possible, if the left half of the
retrieval pointer is zero:
o If Bit 18 = 1, Bits 19 through 35 contain the logical unit
number of the next unit to get data from. This allows one
RIB on one unit to hold pointers to data on another unit in
the same structure.
o If the right half is zero, there is no more data in the file.
If a file needs more than one RIB to retrieve the data, it has
extended RIBs at the start of subsequent groups. The monitor also
writes an extra copy of each RIB as the last block pointed to by the
RIB, for disk error recovery purposes. That copy is known as the
spare RIB. The first RIB is known as the prime RIB.
Each disk unit contains a HOME block, which describes the file
structure that contains the disk unit, and points to the MFD. Blocks
1 and 10 (decimal) on the disk contain the HOME block, which records
the following information:
4-26
EXAMINING THE DATA STRUCTURES
o The file structure to which this unit belongs, and the unit's
position within the structure
o The characteristics of the unit and file structure
o A pointer to the MFD
The monitor uses the HOME block to find the MFD when the file
structure is mounted for a user.
The monitor keeps information about used disk blocks in the Storage
Allocation Tables (SAT blocks). The SAT block on each file structure
is stored as SYS:SAT.SYS. Each bit in the SAT block represents a
group of contiguous disk blocks called a cluster.
The smallest unit of data on disk that the monitor can allocate is the
cluster, which is composed of a specific number of disk blocks. A
small disk unit might use a cluster size of 3 blocks (600 words). If
the monitor must allocate space to a file that is smaller than
200 (octal) data words, an entire cluster is allocated. When the
cluster size increases, fewer SAT blocks are required for storage
allocation information; with fewer reads/writes to the SAT, a smaller
number of operations is required to assign and release disk space.
Large clusters save memory at the expense of disk space. Because disk
space is allocated in clusters, short files result in wasted space if
the cluster size is too large.
The MFD contains pointers to the UFDs on the disk unit. The UFD
contains a two-word entry for each file in the UFD. The UFD entry
specifies the file name in the first word, and file extension in the
left half of the second word and a pointer to the file in the right
half of the second word called the compressed file pointer (CFP). The
CFP is the 18-bit address of the RIB of the file, pointing to the
first supercluster of the file. A supercluster is a set of clusters
stored contiguously on disk. A file always starts at the supercluster
boundary, but one file may fill many superclusters of disk space.
The number of blocks per cluster is usually equivalent to the number
of blocks per supercluster. However, if the total number of clusters
on a file structure is greater than 262,143, the clusters are
regrouped into superclusters such that the number of superclusters is
less than or equivalent to 262,143 (the largest number that can be
stored in the right half of the second word in the UFD entry). The
number of clusters per supercluster is stored in the HOMe block, and
in the STR block when the monitor is running.
4.18.1 Finding Information on Disk
The following example shows how to use FILDDT to retrieve information
4-27
EXAMINING THE DATA STRUCTURES
stored on a disk, using the /U switch to look at a disk unit. This
example shows how to locate the contents of the file
DSKA:H616.TXT[64,2]; DSKA is mounted on RPB1.
First, run the monitor-specific FILDDT (MONDDT in this manual), and
specify the physical disk unit you want to examine, followed by the /U
switch:
.R MONDDT
File:RPB1:/U
/U requires that you be logged in as [1,2], and instructs FILDDT to
treat the disk as addressable.
The first data structure to use in examining the file is the HOME
block. It holds pointers to other files, and can always be found at
Blocks 1 and 10 (decimal) on a disk. To access the first word of the
HOME block, specify location 200 to FILDDT. Each block is
128 (decimal) words, which equals 200 (octal).
Remember to convert disk block numbers to FILDDT addresses by
multiplying by 200. If converting cluster addresses, multiply by
200*n, where n is the cluster size. For example, if the cluster size
is 5, use the following calculation to specify the block number. (The
numeric base of the following calculations are indicated by (8) for
octal and (10) for decimal).
Block 15(10) = Block 17(8) * 200 = 3600(8) in FILDDT
Cluster 11(10) = Cluster 13(8) * 5 = Block 67(8) = 67 * 200 = 15600
To examine the HOME block, type the following:
200/ HOM ;Name of HOME block
201/ DSKA01 ;Unit ID
202/ 0
203/ 0
204/ DSKA ;Structure name
The pointer to the MFD's RIB is at offset HOMMFD:
200+HOMMFD/ 4204
This location contains the block number. All subsequent addresses are
cluster numbers. The size of a cluster is stored in the HOME block at
location HOMBSC:
200+HOMBSC/ 12 ;Blocks per supercluster
200+HOMBPC/ 12 ;Blocks per cluster
In this case, a cluster is 10 (decimal) blocks.
4-28
EXAMINING THE DATA STRUCTURES
The MFD's RIB confirms that you have the correct RIB:
4204*200/ 777653,,41
1,,41001/ 1,,1 ;Owner of file
1,,41002/ 1,,1 ;File name
1,,41003/ UFD)EC ;File extension in left half
Examine the first retrieval pointer to find the MFD itself. The right
half of the contents of the first word in the RIB contains the offset
within the RIB to the first retrieval pointer. The left half of the
first word is the negative of the maximum number of retrieval pointers
that may be stored in the RIB.
1,,41001+41/ 400000 ;Unit change pointer to Unit 0
1,,41002+41/ 4010,,100332 ;1st real retrieval pointer
The first cluster of the MFD is number 332. This corresponds with
Block 332*12=4204 (octal), the address of the RIB (stored in HOMMFD,
shown above). The RIB is stored in the first block of the
supercluster when the file is initially allocated. The monitor checks
to see if the RIB address is the same as the first group of data. If
so, the monitor retrieves the second block for data. Look at 1,,41200
(4204*200) for the MFD:
1,,41200/ 1,,1 ;[1,1] UFD
1,,41201/ UFD. : = 654644,,332
1,,41202/ 1,,4 ;[1,4] UFD
1,,41203/ UFD = 654644,,3
1,,41204/ 3,,3 ;[3,3] UFD
1,,41205/ UFD > = 654644,,336
1,,41206/ 10,,1 ;[10,1] UFD
1,,41207/ UFD ? = 654655,,337
1,,41210/ 1,,2 ;[1,2] UFD
1,,41211/ UFD @ = 654644,,340
1,,41212/ 1,,5 ;[1,5] UFD
1,,41213/ UFD A = 654644,,341
1,,41214/ 1,,3 ;[1,3] UFD
1,,41215/ UFD B = 654644,,342
1,,41216/ 64,,2 ;[64,2] UFD
1,,41217/ UFD E = 654644,,345
The first word of each two-word MFD entry contains the UFD name. The
second word contains the UFD extension in the left half and the
supercluster address of the RIB in the right half. The pointer to the
UFD RIB is located at supercluster 345 (assuming the supercluster size
is equivalent to 1).
345*12*200/ 777653,,41
1,,RNA2CB+71[ 1,,1 ;Owner of file
1,,RNA2CB+72[ 64,,2 ;File name
1,,RNA2CB+73/ UFD)EC ;LH = file extension
4-29
EXAMINING THE DATA STRUCTURES
345*12*200 41/ 400000
1,,RNA2CB+133/ 1000,,345 ;Location of UFD
Again, the RIB takes up the first block of the cluster. Add
200 (octal) to the address of the RIB to get the first data block of
the UFD. If the cluster size is 1 block, you have to read the
retrieval pointer for the first data block.
345*12*200+200/ F601
1,,RNA3CB+71/ EXE &S
1,,RNA3CB+72/ D602
1,,RNA3CB+73/ EXE GN
.
.
.
1,,D3KDB+1/ H616
1,,DSKDB+2/ TXT!T4 =647064,,16424
The location of the RIB for the file is at Supercluster 16424:
16424*12*200/ 777653,,41
44,,262001/ 64,,2
44,262002/ H616
44,262003/ TXT)CT ;LH = file extension
16424*12*200+41/ 400000
44,,262042/ 1655,,616424
Finally, you reach the file, which contains:
44,,262200/
DATA
44,,262201/ A AT
44,,262202/ TIME
44,,262203/ OF SE
44,,262204/ R062.
44,,262205/ CRASH
44,,262206/
44,,262207/ VMA,
44,,262210/ PC=53
44,,262211/ 7771
44,,262212/
(FRO
44,,262213/ M KLD
44,,262214/ CP AL
44,,262215/ L COM
44,,262216/ MAND)
.
.
.
4-30
EXAMINING THE DATA STRUCTURES
Reformatting to make reading easier yields the following:
DATA AT TIME OF SER062.CRASH
VMA, PC=537771
(FROM KLDCP ALL COMMAND)...
4.18.2 In-Core File Information
To keep accurate information in a readily accessible place, the
monitor maintains information about the following, in memory:
o Structure information
o Device information
o File information
o User information
To access a file structure, the monitor keeps a file structure data
block called STR. It contains the name of the structure, allocation
information, swapping information, and pointers to MFD and HOME
blocks. The STRs are stored in a linked list, each entry pointed to
by the system table TABSTR. A structure is identified by the offset
into TABSTR where its entry is stored. The word SYSSTR points to the
first structure. The STR also points to the physical units in the
file structure.
The Unit Data Block (UDB) contains information about the physical disk
unit, including:
o Physical unit name
o Pointers to related UDBs
o Pointers to HOME blocks and SAT blocks
o Unit parameters (cluster size, and so forth)
The UDBs for each structure are linked and each UDB points back to the
STR. Because of these linkages, the STR points only to the first UDB.
The UDB addresses are dynamically assigned by AUTCON.
The STR accesses the following data structures:
o SABs (Storage Allocation Blocks) are in-core copies of the
SAT tables. Copies of the SATs are read into memory at
system startup and updated on disk after every write
operation.
4-31
EXAMINING THE DATA STRUCTURES
o SPTs (Storage allocation Pointer Tables) contain pointers to
all SAT blocks for a unit. Do not confuse the SPTs (Storage
allocation Pointers Tables) used in disk I/O, with the SPT
(Special Pages Table) used in mapping user jobs into physical
memory.
o The PWQ (Position Wait Queue) is an ordered list of DDBs that
have positioning requests for that unit.
The controller data block (KON) is connected to the UDB and contains
information about the device controller for that unit. The channel
data block (CHN) is linked to the KON and contains information about
the hardwar channel associated with that disk controller. The CHN
holds the transfer wait queue (TWQ) for the disk drives on that
channel.
The PWQ and the TWQ contain information for performing I/O requests,
and the order in which they are to be serviced. Both of these queues
are required to drive a disk device. The format and naming scheme is
the same as the channel data block for tape drives.
Only the static state of the file system can be described here. In a
timesharing environment, jobs can modify files while the same files
are being used by other jobs. The monitor requires special
information for the contention-free management of the files. To keep
track of currently open files, the monitor's data base shows the
versions of all open files for all PPNs at any given time.
The file data base is organized using the following data structures:
o The PPB, the PPN data Block, contains information about all
files for a specific PPN. There is one PPB for each PPN that
has open files. All PPBs for all jobs are linked together;
the first is pointed to by SYSPPB.
o The NMB, the Name Block, contains the file names of all open
files on all file structures for a PPN. There is one NMB for
each open file of each PPN, regardless of the number of
versions of the file that are in existence. A word in the
PPB points to the the first NMB in a list.
o The ACC, the access table, contains information needed to
gain access to a specific version of a specific file. The
location of the first RIB is stored here, with the file
structure number. The ACC entries are linked in a ring
through the NMB.
At any time there are two possible versions of a file: the
current version and the superseding version. Usually there
is only one ACC; but while the file is being superseded, both
the old and new versions of the file have ACCs linked to the
NMB. There may be several ACCs if the file exists on more
4-32
EXAMINING THE DATA STRUCTURES
than one file structure, or older versions of a file are
still open.
o The UFB is a UFD data block. The monitor keeps a UFB for
each UFD for each file structure for your job. Each UFB
contains the first retrieval pointer to the UFD. The PPB
contains a pointer to the UFB for the first structure.
Every LOOKUP to a file is recorded in the PPB, the NMB, and the UFB.
If the monitor cannot find a file, it marks the NMB to indicate that
the file does not exist. Likewise, if the UFD does not exist, the
monitor marks the UFB accordingly. There are two words in each of
these data structures to contain this information. The first word is
the KNO word, short for KNOW. This is set to tell whether the monitor
checked to see if the file or UFD exists. If the bit is zero, a disk
read will be required to find out if the file exists. If the bit is
one, the second word, the YES word, is valid. If the YES word
contains 0, the file does not exist; if the word is one, the file does
exist and there is probably information about it in the PPB and NMB.
The goal of this information storage is to reduce the number of disk
reads for discovering whether a file exists and where it is stored.
This is especially useful during debugging, when the same group of
files are used over and over again (source program, compiler, and
linker, for example). Of course, not all the file information can fit
into memory. The disk data structures are managed like a cache, where
the oldest entries are discarded in favor of those accessed more
recently.
The disk DDB is extremely important because it is the central source
of information for all disk I/O operations. It contains pointers and
links to many other data structures, including:
o The current retrieval pointers being used by the disk
routines, and the block numbers to which the pointers refer.
o Pointers to the UDB and STR where the file resides.
o Pointers to the buffer ring header and user buffers.
o The PWQ and the TWQ, which make a linked list of DDBs waiting
to use the disk and channel.
o Pointers to the ACC and UFD.
Disk DDBs are created when the device is OPENed and a software channel
is created; they are deleted when the channel is closed. Disk DDBs
are stored in the user's funny space.
4-33
EXAMINING THE DATA STRUCTURES
4.18.3 The Software Disk Cache
The in-core file information that is being input or output can be
cached in memory, allowing the monitor to access disk information more
efficiently. The following data blocks are used in caching disk I/O
information.
The data structures for the software disk cache are two doubly linked
lists, a list header, and a hash table. Each entry in the list
contains forward and backward pointers for each of the two lists,
(.CBNHB, .CBPHB, .CBNAB, and .CBPAB), a UDB address (.CBUDB), a block
number (.CBBLK), and a pointer to the address in free core where the
block is (.CBDAT). For statistical purposes, the entry also contains
a count of the number of times the block has been accessed since it
was included in the list (.CBHIT).
The list header points to the two linked lists. The first linked list
is the "access" list. The most recently accessed block is at the top
of the list; the least recently accessed block is at the end. The
access list is linked through the .CBNAB/.CBPAB words.
The second linked list is the "free" list. It contains a list of all
blocks that are not currently in use and do not appear in the hash
table. The free list is linked through the .CBNHB/.CBPHB words.
The hash table consists of pointers to the free list corresponding to
the blocks that hash to the same position. Thus, the hash table
consists of separate list heads for the lists of blocks that hash to
that position in the hash table.
At initialization time (CSHINI), all the blocks are allocated and
linked into the free list. They are also linked into the access list.
The hash table entries are linked to themselves because the table is
empty.
To find an entry, given its UDB and block number, use the block number
as the offset into the hash table. Use the hash table entry as a list
head, following the list until you either find a match, or return to
the header. This is done with the CSHFND routine. In general, these
lists are very small, most commonly only one or two blocks.
The main cache handling routine is CSHIO, which will simulate I/O from
the cache, doing the necessary physical I/O to fill and write the
cache. Note that this is a write-through cache, so no sweeps are
required, and the data in the cache always reflects the blocks on
disk.
4-34
EXAMINING THE DATA STRUCTURES
4.18.4 Finding In-Core File Information
The following example finds the file information stored in memory for
Job 3. First, you must set up paging for the job:
.C0EPT/ .E0EPT
$Q'1000$U
JBTNAM+3$6T/ ACTDAE ;Program name
JBTUPM 3[ 42000,,354 ;UPT at page 354
.$6U ;Mapping command
Then search for the assigned DDBs:
USRJDA[ 0 ;Channel 0
FOPBUF#+52[ 0 ;Channel 1
FOPBUF#+53[ 0 ;Channel 2
FOPBUF#+54[ 0 .
FOPBUF#+55[ 0 .
FOPBUF#+56[ 0 .
FOPBUF#+57[ 0 .
FOPBUF#+60[ 0 .
FOPBUF#+61[ 0 .
FOPBUF#+62[ 0 .
FOPBUF#+63[ 0 .
FOPBUF#+64[ 0 .
FOPBUF#+65[ 0 .
FOPBUF#+66[ 0 ;Channel 15
FOPBUF#+67[ 0 ;Channel 16
FOPBUF#+70[ 0 ;Channel 17
.USCTA[ 20,,741200 ;Check for extended channels
741200[ 564200,,740000 ;Channel 20
741201[ 560200,,740066 ;Channel 21
741202[ 474000,,740154 ;Channel 22
741203[ 403000,,740242 ;Channel 23
741204[ 441100,,740330 ;Channel 24
741205[ 474100,,740416 ;Channel 25
741206[ 0 ;Channel 26
741207[ 0 ;Channel 27
In this case, there are six open DDBs, all in the extended channel
table. They point to DDBs in funny space, so they must be for disk
files. Looking closer, you can find the names of the files. The
examples below show how this was done for the first three DDBs listed
above.
740000$6T/ ACT
DDB20: ;Label this as the DDB
DDB20+DEVFIL$6T/ USAGE ;for Channel 20.
DDB20+DEVEXT$6T/ OUT !
DDB20+DEVPPN[ 1,,7 ;ACT:USAGE.OUT[1,7]
4-35
EXAMINING THE DATA STRUCTURES
740066$6T/ ACT
DDB21: ;Label this as the DDB
DDB21+DEVFIL$6T/ FAILUR ;for Channel 21.
DDB21+DEVEXT$6T/ LOG =
DDB21+DEVPPN[ 1,,7 ;ACT:FAILUR.LOG[1,7]
740154$6T/ ACT
DDB22+DEVFIL$6T/ USEJOB
DDB22+DEVEXT$6T/ BIN W
DDB22+DEVPPN$6T[ 1,,7 ;ACT:USEJOB.BIN[1,7]
Now examine the USEJOB.BIN file. From the DDB, you can find which
unit the file is on:
DDB22+DEVUNI/ 142314,,142314 ;original UDB,,current UDB
142314$6T/RAJ3 ;Physical device name
RAJ3: ;Label the UDB
RAJ3+UDBKDB[ 136770 ;KDB
RAJ3+UNILOG$6T/ DSKA0 ;Logical name within structure
RAJ3+UNIHID$6T/ DSKA0 ;HOME block ID name
RAJ3+UNISYS[ 142444,,46000 ;Next UDB in system,,bits
RAJ3+UNISTR[ 145324 ;Next UDB for STR
RAJ3+UNICHN[ 142444 ;Next UDB on channel
RAJ3+UNIKON[ 142444 ;Next UDB on controller
.
.
.
The unit is RAJ3, which is part of the structure DSKA.
Included in the UDB is a pointer to the structure data block (STR).
145324$6T/ DSKA ;STR name
DSKA: ;Label the CHN
DSKA+1[ 145274,,10 ;Next STR,,STR number
DSKA+2[ 142314,,0 ;First UDB for STR,,K for CRASH.EXE
DSKA+3[ 1 ;Number of units in STR
DSKA+4[ 3,,41577 ;Quota words
DSKA+5[ 3,,41600 ; .
DSKA+6[ 0 ; .
DSKA+7[ 0 ; .
DSKA+10[ 0
DSKA+11[ 266532
DSKA+12[ 777777,,777014
DSKA+13[ 7 ;Mount count
DSKA+14[ 410,,512304 ;First retrieval pointer to MFD
.
.
.
4-36
EXAMINING THE DATA STRUCTURES
There are two other methods for locating a disk structure. The first
is to start with SYSSTR and follow the links to each structure:
SYSSTR/ 247103,,1 ;Pointer in left half
247103$6T/ SIRS ;1st STR in linked list
247104[ 240137,,15
240137$6T/ BADP ;2nd STR in list
240140[ 110521,,14
110521$6T/ 7A ;3rd STR in list
110522[ 145324,,1
145324$6T/ DSKA ;4th STR in list
Or, with the file structure number, you can index into TABSTR:
TABSTR/ 777733,,1
TABSTR+1/ 110521
TABSTR+2/ 145324
145324$6T/ DSKA
Notice that the links started by SYSSTR are not in the same order as
TABSTR.
You can use the UDB to find several other structures:
RAJ3: UNIQUE/ 0 ;Position wait queue
RAJ3: UNIPTR/ 0 ;-Length,,addr of swap SAT
RAJ3: UNISAB/ 7,,31271 ;First SAB in ring,,addr of SPT
From the UDB, you can find the KDB:
RAJ3: UDBKDB/ 136770 ;Ptr in UDB to KDB
136770$6T/ RAJ ;Controller name
RAJ: ;Label this
RAJ+1[ 76237 ;Next controller on system
RAJ+2[ 7 ;CPU accessibility mask
RAJ+3[ 136704 ;KDBCHN -- CHN
RAJ+4[ 777740,,137063 ;KDBIUN -- Initial pointer to units
.
.
.
You can get the channel data block from the KDB:
RAJ KDBCHN/ 136704 ;KDB pointer to CHN
136704/ 0 ;-1 if channel idle
CHN: ;Label it
CHN+1/ 142750,,0 ;Next CHN,,last UDB with error
CHN+2/ 0 ;Error information
CHN+3/ 0
CHN+4/ 0
4-37
EXAMINING THE DATA STRUCTURES
The other file information can be found by starting with SYSPPB and
following pointers to the correct PPB, NMB, and ACC. (DEVACC in the
DDB also points to the ACC.)
SYSPPB/ 120140,,0 ;Pointer to first PPB
120140[ 1,,4 ;Project,,programmer number
120141[ 120440,,0 ;Next PPB in system,,0
120440[ 1,,7 ;Project,,programmer number
PPB: ;Label it
PPB+1[ 120560,,0 ;Next PPB in system,,0
PPB+2[ 120450,,0 ;First UFB this PPN,,0
PPB+3[ 120460,,0 ;First NMB this PPN,,bits
PPB+4[ 6 ;Use count
PPB+5[ 410 ;KNO bits
PPB+6[ 410 ;YES bits
PPB+7[ 0 ;Interlock bits
Now you can look for the file USEJOB.BIN in the NMB:
120460$6T/ USAGE ;File name - USAGE
120461[ 120510,,0 ;Next NMB,,0
120510$6T/ FAILUR ;File name - FAILUR
120511[ 120540,,0 ;Next NMB,,0
120540$6T/ USEJOB ;File name - USEJOB
NMB: ;Label it
NMB+1[ 122670,,0 ;Next NMB,,0
NMB+2[ 26325 ;Compressed file pointer
NMB+3[ 120550,,425156 ;ACC,,file extension in SIXBIT
NMB+4[ 110000,,0 ;File structure number
NMB+5[ 400 ;KNO bits
NMB+6[ 400 ;YES bits
NMB+7[ 2 ;Use count
And finally, you can get to the ACC from the NMB:
120550[ 156 ;Highest block allocated
ACC: ;Label the ACC
ACC+1[ 120542,,200000 ;NMB,,bits
ACC+2[ 1100,,26325 ;First retrieval pointer
ACC+3[ 0 ;Dormant ACCs
ACC+4[ 110020,,120440 ;Bits,,PPB
ACC+5[ 222136,,410
ACC+6[ 145
ACC+7[ 55744,,332136
The ACC points back to both the NMB and PPB. Note, however, that the
ACC may point to another ACC, which may point to the NMB. This is
ascertained by examining the last digit of the left half of the NMB.
If the last digit is 2, as in this example, the left half of the NMB
ACC word points to an NMB. If the digit is not 2, the NMB points to
another ACC.
4-38
EXAMINING THE DATA STRUCTURES
The PPB also points to the UFB.
DDB22 DEVUFB/ 120450 ;DDB pointer to UFB
PPB PPBUFB/ 120450,,0 ;PPB pointer to UFB
120450/ 377777,,700521 ;Total blocks left this UFD
UFB: ;Label it
UFB+1[ 122420,,775400 ;Next UFB,,bits
UFB+2[ 100,,52166 ;First retrieval PTR to this UFD
UFB+3[ 5 ;Bits
UFB+4[ 110000,,0 ;File structure number
UFB+5[ 104,,0 ;N if job N owns AU for this UFB
UFB+6[ 0 ;Non-zero if waiting for AU
UFB+7[ 0 ;=1 if UFD has empty data blocks
In all cases, check the Monitor Tables Descriptions and the source
listings to find the interconnections between the data structures and
how to interpret what is stored in them.
4-39
5-1
CHAPTER 5
ERROR HANDLING ROUTINES
The monitor reports hardware and software problems by displaying error
messages on the CTY, but these messages include only a small portion
of the information that the monitor stores in its database.
This chapter will show you how to take a message from the CTY and use
it to trace through the dump to obtain more information. This
involves working with the APR interrupt routine, the page fail trap
routine, and the stopcode routine. You can use this information to
deduce the scope and nature of the problem more accurately.
The error routines of the monitor are designed to handle both software
and hardware errors. When software errors are detected, control
usually jumps to an error handling routine for processing. Hardware
errors, however, can interrupt processing and sometimes halt the
system.
5.1 HARDWARE ERRORS
You can use the CTY message to trace an error to the actual hardware
that failed. The following types of hardware-related messages may
appear on the CTY.
The most serious hardware error is indicated by one of the following
messages:
?NON-RECOVERABLE MEMORY PARITY ERROR IN MONITOR
[CPU HALT]
or
?NON-EXISTENT MEMORY DETECTED IN MONITOR
[CPU HALT]
5-1
ERROR HANDLING ROUTINES
In this case, the error is so serious that the processor is halted
immediately and no further error processing can be done.
A second type of problem is an AR/ARX parity trap, indicated by the
following message:
************
CPU0 AR/ARX PARITY TRAP AT USER PC 401123 ON dd-mmm-yy
JOB 1 [SYSTAT] WAS RUNNING
PAGE FAIL WORD = 000000,,00011
MAPPED PAGE FAIL ADDRESS = 547000,,560271
INCORRECT CONTENTS = 000000,,000000
CONI PI, = 000000,,000377
RETRIES UNSUCCESSFUL, OFFENDING LOCATION ZEROED
************
Another type of parity trap is a page table parity trap, indicated by
the following:
************
CPU0 PAGE TABLE PARITY TRAP AT EXEC PC 414555 ON dd-mmm-yy hh:mm:ss
PAGE FAIL WORD = 000000,,00011
CONI PI, = 010000,,020377
************
A CPU interrupt due to a parity or NXM error is reported as:
************
CPU1 PARITY ERROR INTERRUPT AT USER PC 343413 ON dd-mmm-yy hh:mm:ss
JOB 2[WBKI] WAS RUNNING
CONI APR, = 003002,,312022
CONI PI, = 010000,,020377
ERROR INVOKED BY A message
************
This report can have several variations, depending on the CPU and the
specific error. The monitor can include any of these error messages:
CACHE WRITE-BACK FORCED BY A SWEEP INSTRUCTION.
CHANNEL STATUS WORD WRITE.
CHANNEL DATA WORD WRITE.
CHANNEL READ FROM MEMORY.
CHANNEL READ FROM CACHE.
CPU WRITE TO MEMORY (NOT CACHE).
CACHE WRITE-BACK FORCED BY A CPU WRITE.
5-2
ERROR HANDLING ROUTINES
CPU READ OR PAGE REFILL FROM MEMORY.
PAGE REFILL FROM CACHE.
After this or other errors, the monitor may also attempt to check for
problems by scanning memory for parity errors or nonexistent memory.
A memory scan can produce one of the following reports:
************
MEMORY PARITY SCAN INITIATED BY CPU0 ON dd-mmm-yy hh:mm:ss
NOTHING WAS FOUND
***********
************
NON-EXISTENT MEMORY SCAN INITIATED BY CHANNEL 1 ON CPU1 ON dd-mmm-yy
hh:mm:ss
NON-EXISTENT MEMORY DETECTED:
AT 314243 (PHYS.)
***********
The channel number (CHANNEL 1) listed in this message refers to the
sofware channel data block (CHN) number, not an RH20 channel.
Memory parity errors or nonexistent memory errors on a channel produce
a special message:
************
CPU1 CHANNEL MEMORY PARITY ERROR ON dd-mmm-yy hh:mm:ss
DEVICE IN USE IS RPA2
CHANNEL TYPE IS type
TERMINATION CHANNEL PROGRAM ADDRESS = 000477
TERMINATION DATA TRANSFER ADDRESS = 251470
LAST THREE CHANNEL COMMANDS EXECUTED ARE:
760000,,252777
760000,,251777
760000,,250777
************
The CHANNEL TYPE listed in this message may be DF10C, DX10, RH20,
CI20, NIA20, or SA10. Hardware errors signal the software in either
of two ways: by a processor (APR) interrupt or by a page fail trap.
APR interrupts are usually generated on the highest PI level, because
CPU errors are serious and must interrupt other devices. When
notified of such errors, the monitor reads the hardware registers and
takes the appropriate action.
To obtain more information about the error and the state of the
monitor, you must examine the dump. It is important to understand how
the monitor handles hardware errors. The following sections describe
the routines in the monitor that handle errors.
5-3
ERROR HANDLING ROUTINES
5.1.1 APR Interrupt Routine
The routine to handle APR interrupts is APnINT, where n is the CPU
number. It is defined by a macro in COMMON, and handles all the
possible conditions that could cause a processor interrupt, which are:
o Cache-sweep-done
o Power fail
o Timer timeout (clock tick)
o I/O page fail error
o NXM error
o Cache directory parity error
o MB parity error
o Address parity error
o SBUS error
A clock tick or cache-sweep-done interrupt happens frequently and the
monitor deals with them quickly. The other conditions require more
extensive processing.
MB and NXM errors undergo even more analysis and eventually produce
one or more of these error reports: CPU parity error or NXM
interrupt, a memory scan, or the nonrecoverable error message.
5.1.2 Page Fail Trap Routine
Page fail traps are caused by one of the following conditions:
o Page fault
o Proprietary violation
o AR/ARX parity error (KL10 only)
o Page table parity error (KL10 only)
o Page refill failure (KL10 only)
o Address break (KL10 only)
o Illegal section number (KL10 only)
5-4
ERROR HANDLING ROUTINES
o Illegal indirection (KL10 only)
o Non-existent device or register (KS10 only)
o Hard memory error (KS10 only)
o NXM error (KS10 only)
Some of these conditions are the result of normal operations, such as
an address break, proprietary violation, or page fault. Others are
handled as error conditions. The page fail word describes the type of
page fault that occurred. The trap handler is located at SEILM in
APRSER.
The APR interrupt routine and the page fail trap routine use the same
push-down list, ERnPDL, once an error has been detected. The power
fail routine uses another push-down list, PWFPDL.
The channel error report is produced at the interrupt level of the
device that was doing the transfer. This report usually occurs for
disk and tape devices.
If a parity error is detected in fast memory, DRAM, or CRAM, the EBOX
stops immediately by turning off its clocks. The front-end processor
performs any diagnostic action that is necessary.
5.1.3 Saved Hardware Error Information
The error handling routines store information about hardware errors in
the CPU Data Block (CDB). Some of those locations in the CDB are:
.CnACN (APRSTS) CONI APR,
.CnAEF APR error flag
Parity Error Information:
o .CnTPE contains the total number of parity error words in
memory.
o .CnSPE contains the total number of nonreproducing parity
errors in memory.
o .CnMPA contains the memory parity address for this CPU.
o .CnMPW contains the memory parity word for this CPU.
o .CnMPP contains the memory parity PC for this CPU.
o .CnSB0 contains the SBUS Diag 0 instruction.
5-5
ERROR HANDLING ROUTINES
o .CnS0A contains the answer from the SBUS Diag 0 instruction.
o .CnSB1 contains the SBUS Diag Function 1 instruction.
o .CnS1A contains the answer from the SBUS Diag Function 1
instruction.
NXM Information:
o .CnTNE contains the total number of NXMs for this CPU.
o .CnSNE contains the total number of nonreproducible NXMs for
this CPU.
o .CnMNA contains the first address found with NXM.
AR/ARX Parity Information:
o .CnPBA contains the physical address that registered bad
parity on last AR/ARX parity trap.
o .CnTBD contains the contents of the bad word on the last
AR/ARX parity trap.
o .CnNPT contains the total number of AR/ARX parity traps.
o .CnAER contains the results of RDERA on a parity/NXM
interrupt.
o .CnPEF contains the results of CONI APR on a parity/NXM
interrupt.
o .CnPPC contains the PC on the last AR/ARX parity trap.
o .CnPFW contains the page fail word on the last parity trap.
o .CnHPT contains the number of hard AR/ARX parity traps.
o .CnSAR contains the number of soft AR/ARX parity traps.
o .CnPTP contains the total number of page table parity traps.
5.1.4 Hardware Error Checking
The KL10 processor is made up of the following hardware components,
the EBOX, the MBOX, and various interfaces and buses. The EBOX, short
for Execution BOX, is responsible for the execution of the
instructions. The MBOX, short for Memory BOX, controls transfers to
and from memory, cache, channels, and the EBOX.
5-6
ERROR HANDLING ROUTINES
The EBOX is composed of the following:
o Instruction Register (IR) receives the instruction code from
the Arithmetic Logic Unit and passes it to the CRAM/DRAM for
execution.
o Dispatch RAM (DRAM) and Control RAM (CRAM) hold the microcode
that implements the PDP-10 instruction set.
o Arithmetic Logic Unit (ALU) is the major working area of the
processor. It has three fullword registers:
AR (Arithmetic Register)
BR (Buffer Register)
MQ (Multiplier/Quotient Register)
The first two registers also have fullword extensions: ARX
and BRX.
o Fast Memory (FM) contains the accumulators (ACs). The EBOX
has eight AC sets.
o Virtual memory address (VMA) keeps the PC and sends the
virtual address to the pager in the MBOX.
o Virtual memory address adder (VMA AD) helps the VMA in its
computations.
o Program Counter (PC) holds the virtual address of the next
instruction to be executed.
The MBOX is composed of:
o Pager (also known as the hardware page table), which holds
512 (MCA20) or 1024 (MCA25) mapping entries from the EPT or
UPT.
o Physical Memory Address register (PMA), which holds the
physical memory address of the next instruction.
o Cache (data and directory): high-speed semiconductor memory
that stores copies of data from regular memory in order to
speed up memory fetches. (MCA20 allows up to 2K of storage;
MCA25 allows up to 4K of storage.)
o Memory Buffer (MB), to control the flow of data to and from
cache, channels, memory, and the EBOX.
o Cache/MB interface, connecting cache to MB.
In addition, a number of buses and interfaces may be connected to the
MBOX, EBOX, and other parts of the system, such as:
5-7
ERROR HANDLING ROUTINES
o E/M interface connects the MBOX and EBOX.
o S/X BUS/MB interface connnects the MBOX with the core/MOS
controllers. The DMA20 is on the SBUS and interfaces to
external memory.
o EBUS connects the EBOX to four DTE20s or eight RH20 slots
(which may contain RH20 or KLIPA/KLNI controllers) and the
DIA20/DIB20 interface to the traditional I/O bus devices.
Combinations of the following modules connect memory and MASSBUS
devices:
o Channel/MB interface connects MB with the channel controller.
o Channel controller controls the flow of data through the
CBUS.
o CBUS and CBUS interface handles data transfers that go
directly to the MBOX, bypassing the EBOX.
o RH20 MASSBUS controller connects the CBUS to the MASSBUS.
o MASSBUS is a standard bus for interfacing tapes and disks to
the KL.
o Device controller (BA10, TD10, RH10,...).
o I/O bus (PTP, PTR,...).
o Channel interfaces (DX10, DX20,...).
o CI20 port connecting the KL10 with the CI20 bus.
o NIA20 port connecting the KL10 with the Ethernet cable.
The KL10 dynamically generates parity in the following places:
o On the output side of the channel status RAMs
o On the output side of the AR
o Entering the pager from MB or AR
o Data stored in fast memory
o Data stored into the channel data buffers (18-bit parity is
generated)
5-8
ERROR HANDLING ROUTINES
Parity is checked after the following operations:
o On all requests from the MBOX
o Data leaves MB to go to the DMA20, pager, channel, cache, AR
or the arithmetic extender
o Data is paged out
o Data enters and leaves the RH20 or the MASSBUS
o Data enters the AR from the MBOX
o Data enters and leaves AR during DTE PI Level 0 interrupt
handling
o Data enters the ARX from the MBOX
o Data leaves fast memory
o Control leaves CRAM/DRAM
Errors detected through parity checking in the last two conditions
cause the KL (EBOX/MBOX) clock to halt immediately, provided that the
correct conditions have been enabled. The relationships among the
places where errors are detected and the condition they evoke is shown
in the following table. Note that parity is generated by the
transmitting device. This table does not include power-fail
conditions.
Table 5-1: Hardware Errors
______________________________________________________________________
Component Error Error Indicator
______________________________________________________________________
MA20 Incomplete cycle SBUS error bit
Address parity error Address parity bit
DMA20 Data parity error SBUS error bit
Address parity error Address parity bit
NXM error SBUS error bit
MB Data parity error MB parity error bit
Nonexistent memory NXM error bit
Pager Page table parity error Page fail trap
code=25
Pager to cache directory CD parity error bit
5-9
ERROR HANDLING ROUTINES
Arithmetic Logic:
(AR, ARX) AR parity error Page fail trap
code=36 (for Exec)
code=76 (for User)
ARX parity error Page fail trap
code=37 (for Exec)
code=77 (for User)
AR/ARX/EBUS parity error* I/O page fail bit
RH20 Data parity error Device interrupt
DX10 Data parity error Device interrupt
* This type of error includes any type of paging failure while PI
CYCLE is set. The PI CYCLE is a microcode condition that is
enabled when the microcode honors a PI request and is disabled
when the first XPCW instruction occurs for Levels 1-7 or a Level
0 request is completed.
______________________________________________________________________
5.2 STOPCODES
Stopcodes are symbolic names representing errors detected by the
monitor. Stopcodes are generated by the STOPCD or BUG. macros. The
DIE routine records error information and initiates a reload, if
required. For a complete list of stopcodes, refer to the Stopcodes
Specification.
The CTY for each CPU in a multi-CPU configuration records the
stopcodes that occur on that CPU. You can use FILDDT to find the
module where a stopcode is defined. You can find a stopcode in the
crash file by looking for a symbol of the form S..name (for
3-character stopcode names) or just name (for 6-character stopcode
names). The following example shows how to find the module where a
KSW stopcode is defined:
S..KSW?
TAPSER G
Stopcodes are defined in many modules of the monitor, but they are
generated by the same macro, the STOPCD macro. The STOPCD macro is
called with:
STOPCD cont,type,name,disp
5-10
ERROR HANDLING ROUTINES
where:
cont is the location to jump to after processing the error.
type is the type of failure and determines the specific
course of action. It can have one of the following
values:
o HALT
o STOP
o JOB
o CPU
o DEBUG
o INFO
o EVENT
name is the unique stopcode name.
disp is the address of the routine containing additional
information, if appropriate.
The severity of the error is indicated by the type of stopcode. The
types of stopcodes are:
o HALT stopcodes occur after the most severe errors. The CPU
cannot continue automatically after a HALT, no additional
information is displayed on the CTY, and no information is
saved (no crash file is automatically created). HALT
stopcodes are also the least likely of the stopcodes to
occur, and are usually caused by recursive calls to the DIE
routine.
HALT stopcodes indicate serious problems that endanger
further system operation. The RSX-20F console front-end
(using the HALT.CMD file) gathers pertinent status and error
information.
o STOP stopcodes are the also serious, and cause the system
(all CPUs) to put their status into memory and wait for the
policy CPU to dump and reload the monitor.
o JOB stopcodes are those that affect only one job but may
indicate problems in the system. If there is an interrupt in
progress, the system will be reloaded. If not, only the
faulty job will be terminated. Then a dump is taken and the
system continues.
5-11
ERROR HANDLING ROUTINES
o A CPU stopcode is important only for multiple-CPU systems.
This stopcode will stop only the current CPU, leaving the
others running. It acts as a STOP stopcode in any of the
following cases:
- Single-CPU systems
- Only one processor running in an multiple-CPU system
- If DF.CP1 is set in the DEBUGF word.
o A DEBUG stopcode affects the system in different ways,
depending on the contents of the DEBUGF word (short for DEBUG
Flags). By setting certain bits in this word, a system
programmer can control the effect of certain stopcodes, and
manner in which the system is reloaded. The DEBUGF flags are
listed in Section 6.3.
o An INFO stopcode displays a message on the CTY and rings the
terminal bell, informing the operator of an event that may be
of interest. Most INFO stopcodes are harmless and can be
ignored. They do not halt the system or job, do not initiate
a memory dump, and do not cause a system reload.
o An EVENT stopcode displays a message on the CTY, similar to
an INFO stopcode, but does not ring the terminal bell.
5.2.1 Stopcode Processing
The DIE routine in ERRCON processes stopcodes in the following manner:
1. Increments .CnDWD to indicate that this CPU has died and to
protect the code from being entered twice by that CPU.
2. Saves the PI status in .CnCPI and turns off the PI system.
3. Saves AC Blocks 0, 1, 2, 3, and 4 in memory.
4. Stores stopcode PC in %SYSPC and .CnSNM.
5. Sets up error stack from ERnPDL.
6. Creates CPU and device status block data using RCDSTB, and
calls DAEMON to output those buffers.
7. Initiates a cache sweep and waits with control in the ACs
until the sweep is finished.
8. Enters the secondary protocol.
5-12
ERROR HANDLING ROUTINES
9. Attempts to get the DIE interlock.
10. Prints stopcode information on CTY.
11. Dispatches to the routine that will take the dump and handle
the specific type of stopcode.
INFO and EVENT stopcodes perform all the functions listed here, except
that they do not turn off the PI system, do not halt the system, and
do not perform a dump and reload. The EVENT output on the CTY is
formatted differently from the other types of stopcodes.
5.2.2 Continuing from Stopcodes
JOB and DEBUG stopcodes do not ordinarily crash the system. They
allow error collection to be done, and then the system can continue.
Whenever a JOB or DEBUG stopcode occurs, the default action of the
monitor is to dump memory to disk for later analysis. This is known
as a continuable stopcode dump and is handled by BOOT. This allows
the system to continue to do work even though the state of the machine
is being saved.
The majority of stopcodes are caused by a corruption of some portion
of the monitor's database. Often, a corrupted piece of data will
cause several stopcodes, one right after the other. However, the
first dump is the most important. When you are analyzing a series of
crashes, look at the first crash in the series.
If two or more crashes have the same time stamp, you should look at
the dump with Bit 8 clear in the DEBUGF word. You can probably ignore
the other dump(s). Refer to Section 6.3 for more information about
DEBUGF flags.
5.2.3 Special Stopcodes
Certain stopcodes occur more frequently because they represent a wide
range of problems. Under these conditions, debugging becomes more
difficult. The stopcodes of this type that you should be aware of are
KAF, IME, UIL, and EUE. The causes for them mentioned in the
following paragraphs are not complete, but they illustrate the way
such a stopcode could occur.
Keep-Alive Fail (KAF) stopcodes occur when the system is hung or
looping. In this situation, you cannot get response from the
terminals, there are no jobs running, and no I/O is being done.
Eventually, the front-end, RSX-20F, realizes the keep-alive count has
expired, and forces the KL to execute the instruction in physical
location 71 of memory, [email protected], which stores the contents of P in
5-13
ERROR HANDLING ROUTINES
KFnSVP, and issues the KAF stopcode. The address (a double-word PC)
of the instruction that was being executed is stored at APnKAF and
APnKAF+1.
A KAF occurs when something prevents the processor from reaching clock
level, thus preventing the keep-alive count from being updated and
scheduling from being done. This can occur if a process at a higher
PI level never exits, which could be caused by one of the following:
o A higher level interrupt goes into an infinite loop.
o A higher level interrupt does not clear an interrupt signal
when the interrupt routine exits. The signal, being
constantly asserted, causes one interrupt after another.
o The clock does not tick because it has malfunctioned.
o The clock does not tick because the PI system has been
disabled.
o A monitor routine does not release an interlock.
o A CPU in a multiple-CPU system does not release a CPU
interlock.
IME stands for Illegal Memory Reference from Executive and is issued
when an unexpected page fault occurs in exec mode. Some of the
potential causes for an IME include:
o An attempt to write into the monitor's high segment.
o An attempt to reference data mapped through a UPT that is not
addressable.
o Invalid indexing because accumulators were misused.
To solve IMEs, you can look at the following locations in the UPT:
o .USPFW (location 500) contains the page fail word.
o .USPFP (501) contains the flags in the left half.
o .USPFN (502) contains the PC of the page fail instruction.
The CDB also contains some relevant information, referenced by the
following symbols:
o .CnAPC contains the APR error or trap PC on this CPU.
o .CnPFW contains the page fail word on traps to SEILM.
5-14
ERROR HANDLING ROUTINES
o .CnPPI contains the results of CONI PI, on a parity/NXM trap.
o .CnTCX contains the page fail word context word on traps to
SEILM.
EUE stands for Executive UUO Error and occurs when the monitor
attempts to execute an illegal UUO (usually with an opcode of 0).
This stopcode is usually the result of the monitor branching to an
address that contains data instead of an instruction. Its causes are
very similar to that of an IME. The same problem may produce an EUE
one time and an IME another time, depending on specific conditions.
To solve EUE and UIL stopcodes, you should look at the contents of the
following locations in the UPT:
o .USMUO contains the flags and left half of the UUO.
o .USMUP contains the address of the UUO routine.
o .USMUE contains the effective address half of the UUO.
o .USUPF contains the process context word at the time of the
UUO.
5.3 ERRORS DETECTED BY RSX-20F
When the RSX-20F console front-end detects certain KL error
conditions, it collects data using command files (sometimes called
TAKE files). The error conditions and the command file for each are
listed below.
The command files are used to gather status and error data for special
cases, and (on single-CPU systems) to assist in system continuation
after a stopcode.
When the RSX-20F reload-enable flag is set, the following command
files are automatically executed for the following conditions:
File Error Condition
CLOCK.CMD Field service probe clock error stop
CRAM.CMD Control RAM (CRAM) clock error stop
DRAM.CMD Dispatch RAM clock error stop
EBUS.CMD EBUS parity error
FMPAR.CMD Fast memory parity clock error stop
DEX.CMD Deposit/Examine failure
HALT.CMD KL executes HALT instruction
TIMEO.CMD Protocol timeout condition
KPALV.CMD Keep-alive failed condition (*)
DUMP.CMD Optional system hung file
5-15
ERROR HANDLING ROUTINES
----------
* When a Keep-Alive Fail occurs, the KPALV.CMD file is not used
immediately. Instead, RSX-20F attempts to reload the monitor at
location 71 (described in Section 5.2.3). If the front-end fails to
reload the monitor, RSX-20F takes a Keep Alive Fail and executes the
KPALV.CMD file. However, if the Retry-Enable Flag (which is set, by
default) is cleared, the KPALV.CMD file is executed immediately
without trying a reload.
The KPALV.CMD is useful when the system hangs without doing any
productive work. You can execute KPALV.CMD to gather status
information and force a dump. To invoke KPALV.CMD, type the
following commands on the CTY:
^/ ;<CTRL-backslash>
PAR>TAKE KLPALV ;initiates the .CMD file
5-16
CHAPTER 6
DEBUGGING THE MONITOR
There are two ways to make corrections to the monitor. The first
method is to alter the running monitor using the monitor-specific
FILDDT. You can use this method when the changes are small and it is
unlikely that the system will crash due to patching errors. The
second method involves taking the system standalone and loading the
monitor with EDDT.
6.1 PATCHING WITH FILDDT
The monitor-specific FILDDT contains functions that allow you to
change or patch the running monitor. To run FILDDT and patch the
monitor, you must use the following commands:
.R MONDDT
File: /M/P
The /M switch indicates that all Examine and Deposit functions will
refer to the running monitor. The /P switch allows you to patch the
monitor. To use these switches, your job must have PEEK and POKE
privileges.
Often the changes to be added in the monitor do not fit easily into
the existing code. To add several lines of code, you must access the
pre-allocated patching space that is resident in the running monitor.
The patching space starts at the address pointed to by the symbol
PATCH. The amount of words reserved for patching space is assembled
into the monitor module PATCH.MAC (the symbol is PATSIZ), but the
patch area is usually 50 (octal) words long. It is recommended that
large changes be made directly to monitor sources, not to the running
monitor.
CAUTION
When you install a change to the running monitor,
remember that the monitor code should not dispatch to
6-1
DEBUGGING THE MONITOR
the patched location until you have installed the
entire patch. Therefore, the instruction that
dispatches to the changed code should be the last
instruction you install. It is recommended that you
use the $< command to FILDDT specifying PATCH as the
patching area.
6.2 USING EDDT
EDDT is a version of DDT that runs in both user and exec modes. EDDT
is part of the monitor, in the sense that it resides in the monitor's
.EXE file and is loaded into core with the monitor. The command to
BOOT to enable debugging with EDDT is:
BOOT>monitor-filespec/EDDT
The /EDDT switch instructs BOOT to start at the EDDT start address
rather than the monitor's normal starting address. You can type /EDDT
or /START:401.
When BOOT starts the monitor at location 401, the CPU is running
unmapped. In this mode, EDDT could run, but the symbol table is
inaccessible. Since this situation would provide only limited
debugging capabilities, the monitor sets up minimal page mapping.
When this is done, all monitor code and the symbol table will be
accessible from EDDT. The monitor than jumps to EDDT.
When EDDT starts, it displays "EDDT" on the CTY and it is similar to
user-mode DDT. There is no prompt, and the command syntax is nearly
identical to DDT. For more information on the exec-mode debugging
commands, refer to the TOPS-10 DDT Manual.
6.2.1 Starting the Monitor
When the monitor is loaded into core, data storage mapping and devices
have not been configured. However, most of the useful information on
the status of the monitor is contained in the monitor's high segment.
The monitor will be mapped after you start it, but normally the
monitor's symbol table, EDDT, and the SYSINI locations are cleared
after initialization. You can preserve the symbol table, EDDT, and
SYSINI initialization code by starting the monitor at location DEBUG,
using the following command to EDDT:
DEBUG$G
On a normal startup, the monitor discards its symbol table, EDDT, and
SYSINI initialization code. The address space is reclaimed for the
6-2
DEBUGGING THE MONITOR
monitor's Section 0 free core pool. However, when you use EDDT to
load the monitor (using the DEBUG$G command), this address space is
preserved, and the symbol table is moved into Section 35 (KL10) or out
of the monitor's address space into unmapped core (KS10). A pointer
to the physical address of the symbol table is stored in the Exec Data
Vector for use by EDDT.
6.2.2 Breakpoints
You can insert breakpoints anytime after the EDDT prompt. Unless you
are debugging system initialization code, it is useful to set an
initial breakpoint at the label "HIGHIN". When this point in the code
has been reached, the monitor is ready to run. That is, all other
CPUs have been started, channels can be autoconfigured, and so forth.
After the monitor starts running, you can type <CTRL/D> on any CTY to
enter EDDT on the current CPU. SCNSER intercepts the <CTRL/D>
character at interrupt level, saves the contents of the current AC
block, and executes an unsolicited breakpoint entry into EDDT. Then
you can type any valid EDDT command on the CTY. You can resume
monitor execution by typing $P. SCNSER will ignore the <CTRL/D>
character that caused control to pass to EDDT. The <CTRL/D> facility
is controlled under timesharing by the use of the following monitor
command on the CTY:
.SET EDDT BREAKPOINT [OFF/ON]
The default setting for this command is ON when Bit 0 is set in the
DEBUGF word.
6.3 DEBUGF FLAGS
The DEBUGF word contains the following flags, which can be set and
cleared using OPR commands. The most useful flag for the systems
analyst is Bit 0, the sign bit. This flag indicates that EDDT is
loaded for debugging the monitor and enables breakpointing monitor
code.
Bit Name Description
0 DF.SBD System being debugged (EDDT loaded).
1 DF.RDC Reload on DEBUG stopcodes.
2 DF.RJE Reload on JOB stopcodes.
3 DF.NAR Do not automatically reload.
4 DF.CP1 Stop entire system on any CPU stopcode.
5 DF.DDC Do not output a memory dump on a DEBUG stopcode.
6 DF.DJE Do not output a memory dump on a JOB stopcode.
7 DF.DCP Do not output a memory dump on a CPU stopcode.
6-3
DEBUGGING THE MONITOR
8 DF.RQC Start CRSCPY program to copy the previous crash
file at the time of the next clock tick on the
policy CPU.
9 DF.RQK Call KDPLDR on the next clock tick.
10 DF.RQN Call KNILDR on the next clock tick (obsolete).
11 DF.WFL Copy output to FRCLIN at system CTY.
12 DF.DDC Disable next CRSCPY request.
13 DF.RIP Reload in progress (RECON. function .RCRLD)
14 DF.RAD Reload after dump (don't dump twice in BOOT).
15 DF.RLD Stopcode caused by a reload (used CRSCPY).
18 DF.BP0 Can enter EDDT on CPU0 using XCT .C0DDT.
19 DF.BP1 Can enter EDDT on CPU1 using XCT .C1DDT.
20 DF.BP2 Can enter EDDT on CPU2 using XCT .C2DDT.
21 DF.BP3 Can enter EDDT on CPU3 using XCT .C3DDT.
22 DF.BP4 Can enter EDDT on CPU4 using XCT .C4DDT.
23 DF.BP5 Can enter EDDT on CPU5 using XCT .C5DDT.
For example, suppose you want to stop the system before reloading to
reconfigure the hardware. To do this, Bit 3 in the DEBUGF word should
be set. To disable automatic reloads, run the OPR program and type
the following commands to CONFIG:
.R OPR<RET>
OPR>ENTER CONFIG<RET>
CONFIG>SET NO AUTO-RELOAD<RET>
CONFIG>EXIT<RET>
6.4 MULTI-CPU ENVIRONMENT
Debugging a multiple-CPU system requires special considerations. EDDT
performs all terminal I/O for the CTY that encountered the breakpoint.
It is not unusual to use all CTYs on the system during a debugging
session.
When a CPU stops at a breakpoint, normally the other CPU(s) will
continue to run. If the breakpoint occurred on a non-policy CPU, the
CTY on the policy CPU will report the following message:
problem on CPUn ...
However, if the breakpoint occurs on the policy CPU, a role switch
occurs and another CPU assumes the role of the policy CPU. Although
this behavior is desirable during timesharing, the role switch makes
it very difficult to debug a multiple-CPU monitor when more than one
CPU is running. Also, when the CPUs in the system detect the fact
that one of the CPUs is not running, interlocks owned by the halted
CPU are broken. If the CPU was actually paused at a breakpoint, and
then continued, CIB stopcodes can occur.
6-4
DEBUGGING THE MONITOR
To prevent role switching, a flag (DEBCPU) is set, and contains the
CPU number on which you typed DEBUG$G. DEBCPU is checked in the
BRKLOK and BECOM0 routines, to prevent possible role switches. This
may be circumvented by patching a JFCL at DDTCPU prior to typing
DEBUG$G.
Monitor messages are sent once per hour on the CTY. The following
patch will circumvent this BIGBEN routine:
BIGBEN/POPJ P,
6.5 CAUTIONS
Remember, EDDT provides little protection against user errors. Keep
the following points in mind when you are debugging a running monitor:
o EDDT cannot execute a UUO when you issue the $X and $$X
commands. This is a restriction. Attempts to do this on a
KL usually result in a PI Level 0 Interrupt Error from
RSX-20F. The monitor performs some UUOs internally, in the
SAVE/GET code, and the CLOSE and FINISH commands.
o You can change the AC block for EDDT when the monitor is at a
breakpoint and you wish to deposit data into an AC block
other than the current one. Use the following command to
change to the AC block you specify (n):
n$4U
Do not attempt to use AC Blocks 6 or 7 on a KL10. This will
crash the system because the microcode uses portions of AC
Block 6 and all of AC Block 7.
o On a multiple-CPU system, there are locations in ONCMOD and
SYSINI where the CPU must wait for another CPU to finish an
operation. If that other CPU is halted at a breakpoint, the
waiting CPU will time out. You must devise specific patches
at CPUXCT to prevent this situation.
6-5
A-1
APPENDIX A
ADDRESS SPACE LAYOUT
Monitor Code Section Layout
NOTE
The specifications shown in the following figures are
subject to change without notice. Addresses are shown
for comparison purposes only; actual addresses may be
different depending on your specific monitor
configuration.
A-1
ADDRESS SPACE LAYOUT
Monitor Code Section Layout
+-------------------------------------------+
00,,000000 | Traditional "Low Seg" |
| COMxxx data structures, Exec page |
| maps, Interrupt vectors & code, |
00,,073777 | Prototypes DDBs, Job (JBT) Tables |
|-------------------------------------------|
00,,074000 | PTY DDBs, TTY DDBs, Monitor free |
| core, KDBs, UDBs, PDBs, Context |
00,,245777 | blocks, etc. |
|-------------------------------------------|
00,,246000 | Void |
00,,327777 | |
|-------------------------------------------|
00,,330347 | Common Subroutines |
00,,334777 | |
|-------------------------------------------|
00,,335000 | Void |
00,,337777 | |
|-------------------------------------------|
00,,340000 | Traditional "High Seg", Pure code, |
| UUO calls, Device drivers, IPCF, |
00,,726777 | ENQ/DEQ, ANF, etc. |
|-------------------------------------------|
00,,727000 | Void |
00,,733777 | |
|-------------------------------------------|
00,,734000 | Per-CPU CDB mapping |
00,,735777 | |
|-------------------------------------------|
00,,736000 | Void |
00,,737777 | |
|-------------------------------------------|
00,,740000 | Job Per-process mapping |
| UPT, Extended-exec-PDL, Disk DDBs, |
| TMPCOR, pathological names, .TEMP, |
00,,777777 | .JBPK, ect. map slots |
|-------------------------------------------|
01,,000000 | Monitor Section One |
01,,777777 | (mapped identically to Section Zero) |
+-------------------------------------------+
Figure A-1: Monitor Code Section Layout
A-2
ADDRESS SPACE LAYOUT
DECnet Code Section Layout
+-------------------------------------------+
02,,000000 | Traditional "Low Seg" |
| COMxxx data structures, Exec page |
| maps, Interrupt vectors & code, |
02,,073777 | Prototypes DDBs, Job (JBT) Tables |
|-------------------------------------------|
02,,074000 | PTY DDBs, TTY DDBs, Monitor free |
| core, KDBs, UDBs, PDBs, Context |
02,,245777 | blocks, etc. |
|-------------------------------------------|
02,,246000 | Void |
02,,327777 | |
|-------------------------------------------|
02,,330000 | Common Subroutines |
02,,334777 | |
|-------------------------------------------|
02,,335000 | Void |
02,,627777 | |
|-------------------------------------------|
02,,630000 | "Sky Hi Seg" |
02,,717777 | DECnet code |
|-------------------------------------------|
02,,720000 | Void |
02,,733777 | |
|-------------------------------------------|
02,,734000 | Per-CPU CDB mapping |
02,,735777 | |
|-------------------------------------------|
02,,736000 | Void |
02,,737777 | |
|-------------------------------------------|
02,,740000 | Job Per-process mapping |
| UPT, Extended-exec-PDL |
| Disk DDBs, TMPCOR |
| Pathological names |
02,,777777 | .TEMP, .JBPK, ect. map slots |
+-------------------------------------------+
Figure A-2: DECnet Code Section Layout
A-3
ADDRESS SPACE LAYOUT
Monitor Data Section 3 Layout
+-------------------------------------------+
03,,000000 | PAGTAB |
03,,017777 | |
|-------------------------------------------|
03,,020000 | PT2TAB |
03,,037777 | |
|-------------------------------------------|
03,,040000 | MEMTAB |
03,,057777 | |
|-------------------------------------------|
03,,060000 | Disk Cache |
03,,174777 | "NZS" free core |
|-------------------------------------------|
03,,175000 | Void |
03,,277777 | |
|-------------------------------------------|
03,,300000 | DECnet "MB" pool |
03,,407777 | |
|-------------------------------------------|
03,,410000 | DECnet free pool |
03,,517777 | |
|-------------------------------------------|
03,,520000 | DECnet name-to-address |
03,,543777 | translation table |
|-------------------------------------------|
03,,544000 | KLNI free pool |
03,,547777 | |
|-------------------------------------------|
03,,550000 | LAT free pool |
03,,553777 | |
|-------------------------------------------|
03,,554000 | Void |
03,,777777 | |
+-------------------------------------------+
Figure A-3: Monitor Data Section 3 Layout
A-4
ADDRESS SPACE LAYOUT
Monitor Data Sections 4,5 Layout
+-------------------------------------------+
04,,000000 | SCNSER TTY LDBs & Chunks |
04,,051777 | |
|-------------------------------------------|
04,,052000 | Void |
04,,777777 | |
+-------------------------------------------+
+-------------------------------------------+
05,,000000 | SCA Free pool |
05,,004777 | |
|-------------------------------------------|
05,,005000 | SCA Datagram buffers |
05,,121777 | |
|-------------------------------------------|
05,,122000 | SCA Message buffers |
05,,165777 | |
|-------------------------------------------|
05,,166000 | SCA Connect ID table |
05,,166777 | |
|-------------------------------------------|
05,,170000 | KLIPA BSDs |
05,,171777 | |
|-------------------------------------------|
05,,172000 | KLIPA BHDs |
05,,172777 | |
|-------------------------------------------|
05,,173000 | LAT "extra allocation" |
05,,176777 | |
|-------------------------------------------|
05,,177000 | Void |
05,,777777 | |
+-------------------------------------------+
Figure A-4: Monitor Data Sections 4,5 Layout
A-5
ADDRESS SPACE LAYOUT
Monitor Data Sections 6,7 Layout
+-------------------------------------------+
06,,000000 | BOOT |
06,,007500 | |
|-------------------------------------------|
06,,007500 | DX10 (DXMPA) ucode |
06,,012250 | |
|-------------------------------------------|
06,,012250 | DX20 (DXMCA) ucode |
06,,014650 | |
|-------------------------------------------|
06,,014650 | DX20 (DXMCD) ucode |
06,,017250 | |
|-------------------------------------------|
06,,017250 | KLIPA (KLPCOD) ucode |
06,,034530 | |
|-------------------------------------------|
06,,034530 | KLNI (KNICOD) ucode |
06,,051777 | |
|-------------------------------------------|
06,,052000 | Void |
06,,777777 | |
+-------------------------------------------+
+-------------------------------------------+
07,,000000 | Swapping SATs |
07,,003777 | |
|-------------------------------------------|
07,,004000 | Disk SATs |
07,,076777 | |
|-------------------------------------------|
07,,077000 | SAT free core |
07,,122777 | |
|-------------------------------------------|
07,,123000 | Void |
07,,777777 | |
+-------------------------------------------+
Figure A-5: Monitor Data Sections 6,7 Layout
A-6
ADDRESS SPACE LAYOUT
Monitor Data Sections 35,36,37 Layout
+-------------------------------------------+
35,,000000 | Symbol table for EDDT while |
35,,252777 | debugging, otherwise void. |
|-------------------------------------------|
35,,253000 | Void |
35,,777777 | |
+-------------------------------------------+
+-------------------------------------------+
36,,000000 | SNOOPY Scratch space |
36,,777777 | |
+-------------------------------------------+
+-------------------------------------------+
37,,000000 | Void |
37,,677000 | |
|-------------------------------------------|
37,,700000 | Exec section maps |
37,,737777 | |
|-------------------------------------------|
37,,740000 | User section maps |
37,,777777 | |
+-------------------------------------------+
Figure A-6: Monitor Data Sections 35,36,37 Layout
A-7
Gloss-1
GLOSSARY
The table below provides an alphabetized list of the abbreviations and
acronyms used in this manual, with expanded names to define them.
Table Gloss-1: Glossary of Acronyms
______________________________________________________________________
Acronym Meaning
______________________________________________________________________
AC Accumulator
APR Arithmatic Processor
BR Buffer Register
CDB Central Processing Unit Data Block
CFP Compressed File Pointer
CHN Channel Data Block
CI Computer Interconnect
CPU Central Processing Unit
CRAM Control Random-Access Memory
CTY Console Terminal
CX A job context
DDB Device Data Block
DDT DEC Debugging Tool
DRAM Dispatch Random-Access Memory
EBR Exec Base Register
EPT Exec Process Table
EVM Exec Virtual Memory
FM Fast Memory
I/O Input/Output
IORB Input/Output Request Block
IPCF Interprocess Communication Facility
IR Instruction Register
JDA Job Device Assignment table
KDB Controller Data Block
KON Disk Controller Data Block
LDB Line Data Block
MB Memory Buffer
Gloss-1
GLOSSARY
MFD Master File Directory
MQ Multiplier/Quotient Register
MUUO Monitor UUO (see UUO)
NI Network Interconnect
NZS Non-Zero Section
PC Program Counter
PDB Process Data Block
PI Priority Interrupt
PMA Physical Memory Address
PPB PPN Data Block
PPN Project-Programmer Number
PTY Pseudo-Terminal
PWQ Position Wait Queue
RAM Read-Access Memory
RIB Retrieval Information Block
SAT Storage Allocation Table
SCA Systems Communications Architecture
SCS Systems Communications Services
SFD Sub-File Directory
SMP Symmetric Multiprocessing
SPR Software Performance Report
SPT Special Pages Table (for mapping)
Storage Allocation Pointer Table (for disk I/O)
STR Structure Data Block
TKB Tape Controller Data Block
TTY Terminal
TUB Tape Unit Data Block
TWQ Transfer Wait Queue
UBR User Base Register
UDB Unit Data Block
UFD User File Directory
UNI Disk Unit Data Block
UPT User Process Table
UUO Unimplemented User Operation (monitor call)
VMA Virtual Memory Address
______________________________________________________________________
Gloss-2
INDEX
-A- Caching
disk information, 4-34
AC blocks UPT locations, 3-7
finding, 3-15 CALLI UUOs, 4-13
switching, 3-6, 4-7 CDB
Access constants area, 4-10
codes, 3-2 defining locations, 4-11
table (ACC), 4-32 variables area, 4-10
Accumulators, 2-6 Changing AC sets, 6-5
locations, 3-15 Channels, 4-13
monitor, 3-6, 4-1 data blocks (CHN), 4-23, 4-24,
saving, 3-11 4-32
scheduler, 3-15 error report, 5-5
traps, 3-14 status bits, 4-13
user, 4-10 Checking parity, 5-9
Addressing non-zero sections, 3-5 Chunks
Allocating disk space, 4-27 counts, 4-18
Alternate page maps, 3-3 terminal, 4-19
ANF-10 networks, 4-14 Clearing virtual addressing, 2-6
APR interrupts, 5-3 Clock, 3-14
APRSER module, 5-5 CLOCK1 module, 3-16
AR/ARX parity errors, 5-6 Clusters, 4-27
Arithmetic Logic Unit (ALU), 5-7 CNFDVN location, 2-9
Assigning channel numbers, 4-13 COMDEV module, 4-5
Attached terminals, 4-21 Command
AU resource, 4-12 dispatch bits, 4-2
AUTCON module, 3-17, 4-24, 4-31 files
Automatic reloads, 2-3 FILDDT, 2-9
AVALTB table, 4-11 RSX-20F, 5-15
tables, 4-12
-B- COMMOD module, 4-5
COMMON module, 3-16, 4-5, 4-11,
30-bit addressing, 3-5 4-21, 5-4
Blocking Common modules, 4-5
programs, 3-5 Compressed File Pointer (CFP),
user jobs, 3-7 4-27
BOOT, 2-1, 2-2 COMTAB table, 4-12
Booting systems, 2-2 Concealed mode, 3-5, 3-6
Break characters, 4-19 Conditionals, 4-7
Breakpointing monitors, 6-3 Connecting devices, 5-8
BUG. macro, 5-10 CONSO skip chain, 3-10, 3-16
Building monitors, 4-5 Console
Byte pointers, 4-4 front-ends, 5-15
terminal, 1-1
Continuable stopcodes, 1-2, 5-13
-C- Control RAM (CRAM), 5-7
Controller data block (KON), 4-32
Cacheable pages, 3-3 Controlling terminal, 4-21
Index-1
Copying crash files, 2-3 DEVIOS word, 4-2
CPNSER module, 3-17 DIE routine, 5-10, 5-12
CPU DIECDB location, 2-6
Data Blocks (CDBs), 2-6, 4-10, Directories, 4-25
5-5 Disabling
interlocks, 6-4 extended addressing, 2-6
stopcodes, 5-11 time messages, 6-5
Crash user addressing, 2-8
analysis, 1-1 Disk
files, 1-2, 2-1 cache, 4-34
space, 2-1 controller data block (KON),
Crash files, 1-1, 1-2, 1-5, 2-1, 4-2
2-3 device data blocks, 4-33
CRASH.EXE file, 2-1 dual-ported devices, 4-7
Creating file structure, 4-25
crash files, 2-1 I/O, 4-24
FILDDT command files, 2-10 on-line information, 4-31
symbolic FILDDT, 2-4 storage allocation, 4-27
CREF Dismissing interrupts, 3-10
listings, 4-7 DISP table, 4-12
program, 4-6 Dispatch RAM (DRAM), 5-7
CRSCPY program, 2-1, 2-3 DN20 front-ends, 4-14
CTXSER module, 3-17 Doubleword PC, 3-4
CTY, 1-1 DTE
Current ACs, 2-8 DDBs, 4-14
Cursor position counter, 4-19 interrupts, 3-12
CX resource, 4-12 DTEPRM module, 3-18, 4-8
CYCLE error, 5-10 DTESER module, 4-14
Cycles, 3-14 Dual-ported disks, 4-7
-D- -E-
D36PAR module, 3-18 Echo count, 4-19
DA resource, 4-12 EDDT, 6-2
DDBs, 4-13 Enabling addressing, 2-6
DEBUG stopcodes, 5-12 ENQ/DEQ
DEBUGF word, 5-13, 6-3 module, 3-17
Debugging the monitor, 6-1 ERnPDL stack, 3-15, 5-5
DECnet ERRCON module, 5-12
front-ends, 4-14 Error
layout, 3-3 handling, 5-1
Defining hardware codes, 3-13
CDB locations, 4-11 parity, 5-9
symbols, 3-18 processing routines, 3-15
Device ETHPRM module, 3-18
codes, 3-6 EUE stopcodes, 5-15
Data Blocks (DDBs), 4-2, 4-14 EV resource, 4-12
information, 4-14 EVENT stopcodes, 5-12
interrupts, 3-9 Exec
status word, 4-2 Base Register (EBR), 3-2
Devices kernel mode, 3-5
RDA, 4-14 mode, 3-2, 3-4, 3-5, 3-6
Index-2
Exec (Cont.) -H-
Process Table (EPT), 2-5, 2-6,
3-2 HALT stopcodes, 5-11
Exec-mode DDT, 6-2 Halting systems, 2-2
EXECAC macro, 4-7 Handling
Execute-only programs, 3-6 errors, 5-1
Executing command files, 2-10 interrupts, 3-11
Execution Box (EBOX), 5-6 Hardware
Executive UUO Error (EUE), 5-15 addressing, 2-6
Exiting FILDDT, 2-4 error codes, 3-13
Extended errors, 5-1
addressing, 2-6, 3-3 interrupts, 3-16
channel table, 4-13 mapping, 3-2
software channels, 3-4 HOME blocks, 4-26
-I-
-F-
I/O
F module, 3-18, 4-7 channels, 4-13
FAKEAC flag, 2-6 Request Block (IORB), 4-23
Fast Memory (FM), 5-7 status word, 4-2
Fatal errors, 1-1, 1-2 tables, 4-13
Fault continuation, 1-3 IF statement, 4-7
Feature test options, 4-7 IME stopcodes, 5-14
FILDDT INFO stopcodes, 5-12
command files, 2-9 Inserting breakpoints, 6-3
mapping commands, 2-6 Instruction Register (IR), 5-7
program, 2-3 Interlocks between CPUs, 6-4
Finding Interrupt, 3-7
AC blocks, 3-15 accumulators, 3-11
DDBs, 4-13 error-handling, 3-10
stopcodes, 5-10 handling routine, 3-9
symbolic definitions, 4-7 levels, 3-7
Flag-PC doubleword, 3-4 PDLs, 3-11
Flags for DEBUGF, 6-3 processor, 5-3
Forced commands, 4-12 stacks, 3-11
Forced system dumps, 2-1 Vector (IVIR), 3-12
Forcing reloads, 2-1 Interrupting
Free core, 4-12 on Level 0, 3-10
Front-ends, 4-14 on Level 7, 3-14
Full clock cycle, 3-14 Intertask communication, 4-14
Funny space, 3-3, 3-4 INTTAB table, 4-12
Invalid mapping, 2-8
IPCSER module, 3-17
-G- IVIR register, 3-12
Generating parity, 5-8 -J-
GLOB program, 4-8
Global JBT tables, 4-8
section references, 3-5 JBTPPB table, 4-32
symbols, 4-1, 4-8 Job
Groups of disk data, 4-26 context module, 3-17
Index-3
Job (Cont.) Magnetic tape devices, 4-23
Device Assignment table (JDA), Mapping
4-13 ACs, 2-8
stopcodes, 5-11 dumps, 2-6
tables, 4-8 exec virtual memory, 2-7
Job-specific monitor locations, extended sections, 2-6
3-3 user jobs, 2-8
JOBDAT verification, 2-8
area, 3-6 virtual addresses, 2-5, 2-6,
locations, 4-10 3-2
module, 3-18, 4-8 Master File Directory (MFD), 4-25
vestigial, 3-4 MCA25 bit, 3-3
MCB software, 4-14
-K- Memory
Box (MBOX), 5-6
Keep Me bit, 3-3 dump, 1-2
Keep-Alive Fail (KAF), 5-13, 5-15 tables, 4-12
Kernel mode, 3-5 MEMTAB table, 4-12
KL interrupt handling, 3-11 MIC information, 4-19
KL-paging, 2-6, 3-3 MM resource, 4-12
KLPPRM module, 3-18 Mode flag, 3-4
KNO word, 4-33 Modules, 3-16
KS common, 4-5
alternate page maps, 3-3 monitor startup, 3-17
interrupt handling, 3-11 optional, 3-17
reloading systems, 2-2 symbol definition, 3-18
MONGEN program, 4-5
-L- Monitor
ACs, 4-1
Label DDBs, 4-24 breakpointing, 6-3
Line building, 4-5
characteristics bits, 4-18 command processing, 4-12
Data Blocks (LDBs), 4-2, 4-18 functions, 3-1
LINTAB table, 4-19 macros, 4-6
Loading FILDDT symbols, 2-4 modules, 3-16
Local symbols, 4-1 name, 2-9
unlocking, 4-8 sources, 4-6
Locating EPTs, 2-6 startup modules, 3-17
Locations symbols, 4-1
0-17, 2-6 version numbers, 2-9
30, 2-1 Monitor-resident user data, 3-3
406, 2-2 Monitor-specific FILDDT, 2-4
407, 2-2 MSCPAR module, 3-18
500, 3-12 Multiple-KL systems, 4-7
DIECDB, 2-6 MUUO, 3-6
LOKCON module, 3-17
Low segment addresses, 2-6 -N-
-M- Name Block (NMB), 4-32
Nested SFDs, 4-25
Macros, 4-6 NETDEV module, 4-14
MACSYM module, 3-18, 4-8 NETPRM module, 3-18, 4-8
Index-4
NETSER module, 4-14 Process
Network devices, 4-14 context word, 3-8
Non-Zero Sections (NZS), 2-6, 3-3, Data Block (PDB), 4-2, 4-9
3-5 tables, 3-2
Nonvectored interrupts, 3-9 Processing
NXM errors, 5-6 errors, 3-15
UUOs, 4-13
-O- Processor
interrupts, 5-3
ONCE module, 3-17 modes, 3-2, 3-5
Optional modules, 3-17 Program Counter (PC), 3-4, 5-7
Prototype KDBs, 4-24
-P- Pseudo-instructions, 4-6
PSISER module, 3-17
Page Public
faults, 3-12 mode, 3-5, 3-6
map pointers, 3-2 pages, 3-3
maps, 2-5, 3-2, 3-3 PULSAR module, 4-24
tables, 4-12 Push-down lists, 3-11
Page fail scheduler, 3-15
codes, 3-13 traps, 3-14
traps, 5-3, 5-4 PWFPDL stack, 5-5
word, 3-12 PXCT instruction, 4-7
PAGTAB table, 4-12
Parity -Q-
errors, 5-5, 5-9
generating, 5-8 QBITS table, 4-11
Partial clock cycle, 3-14 QUESER module, 3-17
Patch
files, 2-9, 2-10 -R-
space, 6-1
PATCH module, 3-17 RDA devices, 4-14
Patching monitors, 6-1 Reading monitor sources, 4-6
Per-process monitor free core, Real-time module, 3-17
3-4 Recovering from errors, 3-14
Performing terminal I/O, 4-20 REFSTR module, 3-17
Physical addresses, 2-6 Registers, 5-7
PI Reloading automatically, 2-3
channels, 3-7 Reloads, 2-1
CYCLE error, 5-10 REQTAB table, 4-12
status word, 3-8 Resetting mapping, 2-8
Pointers, 3-2 Resources, 4-11
compressed file, 4-27 Restoring accumulators, 4-3
DDB, 4-2 Retrieval Information Block (RIB),
MFD, 4-26 4-26
retrieval, 4-26 RH10 interrupts, 3-12
Policy CPU, 2-2, 6-4 RH20 interrupts, 3-12
Position Wait Queue (PWQ), 4-32 RH2PRM module, 4-8
Power-fail stack, 5-5 Role switching, 3-6, 6-4
PPN Data Block (PPB), 4-32 RSX-20F errors, 5-15
Prime RIB, 4-26 RTTRP module, 3-17
Priority Interrupts (PI), 3-7 Run queues, 4-11
Index-5
Running Structure
FILDDT, 2-4 data blocks (STRs), 4-31
symbolic FILDDT, 2-4 disk, 4-25
Sub-File Directories (SFDs), 4-25
-S- Superclusters, 4-27
Swapped-out pages, 4-12
S module, 3-18, 4-8, 4-13 SWITCH.INI files, 3-4
Saving symbolic FILDDT, 2-4 Switching
SAVnx routines, 3-11 AC blocks, 3-6, 3-11, 4-7
SCAPRM module, 3-18 CPUs, 6-4
Scheduler modes, 3-6
ACs, 3-15 UPTs, 3-11
tables, 4-11 Symbol definition, 3-18
SCNSER module, 4-18, 4-19 Symbolic FILDDT, 2-4
SCPAR module, 3-18 Symbols
Sections, 3-3 monitor, 4-1
DECnet, 3-3 verifying, 2-9
mapping, 2-6 Symmetric Multi-Processing (SMP),
pointers, 3-2 4-7, 6-4
references, 3-5 SYSINI module, 3-17, 4-20
tables, 3-2 SYSPPB table, 4-32
SEILM routine, 3-14, 5-5 SYSSTR table, 4-31
Servicing interrupts, 3-10 SYSTAT program, 2-3
SET commands, 4-12
Sharable resources, 4-11 -T-
Shutting down systems, 2-2
Skip chain, 3-10 TABSTR table, 4-31
Software Tape
channels, 4-13 controller data block (KDB),
disk cache, 4-34 4-2, 4-23
Source code, 4-6 I/O, 4-23
Spare RIB, 4-26 label processing, 4-24
Special Pages Table (SPT), 2-6, unit data block (TUB), 4-23
3-4 Terminal
SPT slot, 2-8 chunk pointers, 4-18
Stacks chunks, 4-19
error processing, 3-15 controlling, 4-21
interrupt, 3-11 DDBs, 4-18, 4-20
Starting BOOT, 2-2 Device Data Blocks, 4-18
Startup modules, 3-17 I/O, 4-20
Status bits TMPCOR, 3-4
channels, 4-13 Transfer
I/O, 4-2 tables, 4-11
STOP stopcodes, 5-11 Wait Queue (TWQ), 4-32
STOPCD macro, 5-10 Trapping
Stopcodes, 1-2, 2-12, 5-10 page faults, 3-12, 5-4
Storage Allocation Blocks (SABs), UUOs, 3-6
4-31 Trapping page faults, 5-3
Storage allocation Pointer Tables TSKSER module, 4-14
(SPTs), 4-32 TTFCOM table, 4-12
Storage Allocation Tables (SATs), TTYINI routine, 4-20
4-27 TTYTAB table, 4-21
Index-6
-U- Using (Cont.)
FILDDT, 2-4
SYSTAT, 2-3
UCLJMP table, 4-13 USRJDA location, 4-13
UCLTAB table, 4-13 UUOCON module, 3-16
UFD Data Block (UFB), 4-33 UUOERR routine, 4-13
Unit UUOs
Data Blocks (UDBs), 4-2, 4-31 processing, 4-13
Universal files, 4-8 trapping, 3-6
Unlocking local symbols, 4-8 verification, 3-6
UNQTAB table, 4-12 UUOTAB table, 4-13
Unrestricted device codes, 3-6
UPT locations, 3-7 -V-
User
accumulators, 2-6, 2-8 Vectored interrupts, 3-9, 3-12
ACs, 4-10 Verifying
Base Register (UBR), 3-2 FILDDT mapping, 2-8
buffers, 4-20 UUOs, 3-6
concealed mode, 3-5 Virtual
DDBs, 3-4 address mapping, 3-2
File Directories (UFDs), 4-25 addressing, 2-5, 2-6
jobs Memory Address (VMA), 5-7
blocking, 3-7 sections, 3-3
mapping, 2-8
switching, 3-11 -W-
verifying, 2-8
mode, 3-2, 3-4, 3-5, 3-6 Writeable pages, 3-3
Process Table (UPT), 2-5, 2-6,
3-2 -X-
public mode, 3-5
USERAC macro, 4-7 XPN: area, 2-3
Using
command files, 2-10 -Y-
CREF listings, 4-6
EDDT, 6-2 YES word, 4-33
Index-7