Trailing-Edge
-
PDP-10 Archives
-
SRI_NIC_PERM_SRC_1_19910112
-
6-manuals/1020spear.mem
There are no other files named 1020spear.mem in the archive.
TOPS-10/TOPS-20
SPEAR Manual
| VERSION 6.0 INTERIM RELEASE DRAFT
|
|
|
| December 1984
This manual describes the SPEAR product
(Standard Package for Error Analysis and
| Reporting). SPEAR contains two
| functions that report on the errors and
events that are recorded by the
operating system.
For TOPS-20 systems, this manual
| supersedes the TOPS-10/TOPS-20 SPEAR
| Manual, Order Number: AA-J833A-TK.
| OPERATING SYSTEM: TOPS-10 7.02
| TOPS-20 4.1 (KS/KL MODEL A)
| TOPS-20 6.0 (KL Model B)
| SOFTWARE: SPEAR 2.0
i
First Printing, April 1982
| Revised, December 1984
The information in this document is subject to change without notice
and should not be construed as a commitment by Digital Equipment
Corporation. Digital Equipment Corporation assumes no responsibility
for any errors that may appear in this document.
The software described in this document is furnished under a license
and may only be used or copied in accordance with the terms of such
license.
No responsibility is assumed for the use or reliability of software on
equipment that is not supplied by DIGITAL or its affiliated companies.
| Copyright C, 1982, 1984, Digital Equipment Corporation.
All Rights Reserved.
The postage-prepaid READER'S COMMENTS form on the last page of this
document requests the user's critical evaluation to assist us in
preparing future documentation.
The following are trademarks of Digital Equipment Corporation:
DEC DECnet IAS
DECUS DECsystem-10 MASSBUS
Digital Logo DECSYSTEM-20 PDT
PDP DECwriter RSTS
UNIBUS DIBOL RSX
VAX EduSystem VMS
VT
ii
CONTENTS
CHAPTER 1 SPEAR OVERVIEW
1.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . 1-1
1.2 USER PROFILES AND INTERACTION . . . . . . . . . . 1-3
CHAPTER 2 THE SYSTEM EVENT FILE
2.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . 2-1
2.2 ENTRY CATEGORIES . . . . . . . . . . . . . . . . . 2-2
2.2.1 Software Entries . . . . . . . . . . . . . . . . 2-2
2.2.2 Hardware Entries . . . . . . . . . . . . . . . . 2-2
2.2.2.1 CPU and Memory Failures . . . . . . . . . . . 2-3
2.2.2.2 Channel and Controller Failures . . . . . . . 2-4
2.2.2.3 I/O Device Failures . . . . . . . . . . . . . 2-4
2.2.3 Performance Entries . . . . . . . . . . . . . . 2-4
2.3 RECORDING EVENTS . . . . . . . . . . . . . . . . . 2-4
2.3.1 Record Format . . . . . . . . . . . . . . . . . 2-5
2.3.2 Record Conventions for Numbers and Dates . . . . 2-6
CHAPTER 3 ISOLATING FAULTS
3.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . 3-1
3.2 TYPES OF FAILURES . . . . . . . . . . . . . . . . 3-1
3.2.1 Characteristics of Solid Failures . . . . . . . 3-2
3.2.2 Characteristics of Intermittent Failures . . . . 3-2
3.3 ERROR DETECTING AND ERROR CHECKING . . . . . . . . 3-2
3.3.1 Hardware Error Detectors . . . . . . . . . . . . 3-2
3.3.2 Software Error Checking . . . . . . . . . . . . 3-4
3.4 ISOLATION TECHNIQUES . . . . . . . . . . . . . . . 3-5
3.4.1 Verification . . . . . . . . . . . . . . . . . . 3-6
CHAPTER 4 SPEAR FUNCTIONS
4.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . 4-1
4.2 RUNNING SPEAR . . . . . . . . . . . . . . . . . . 4-1
4.2.1 Prompts, Responses, and Arguments . . . . . . . 4-2
4.2.2 Separators and Terminators . . . . . . . . . . . 4-2
4.2.3 Help Features . . . . . . . . . . . . . . . . . 4-3
4.2.4 File Specifications . . . . . . . . . . . . . . 4-4
4.2.5 SPEAR Switches . . . . . . . . . . . . . . . . . 4-4
4.2.6 Exiting from SPEAR . . . . . . . . . . . . . . . 4-5
4.3 RETRIEVE . . . . . . . . . . . . . . . . . . . . . 4-5
4.3.1 RETRIEVE Input . . . . . . . . . . . . . . . . . 4-6
4.3.2 RETRIEVE Output . . . . . . . . . . . . . . . . 4-7
4.3.3 RETRIEVE Procedure . . . . . . . . . . . . . . . 4-9
4.3.3.1 Retrieving Selected Events . . . . . . . . . 4-10
4.3.3.2 Sample RETRIEVE Session . . . . . . . . . . 4-18
4.3.3.3 Short Format . . . . . . . . . . . . . . . . 4-18
4.3.3.4 Octal Format . . . . . . . . . . . . . . . . 4-19
4.3.3.5 Full Format . . . . . . . . . . . . . . . . 4-21
4.4 SUMMARIZE . . . . . . . . . . . . . . . . . . . 4-24
4.4.1 The SUMMARIZE Report . . . . . . . . . . . . . 4-25
4.4.2 Error Register Codes . . . . . . . . . . . . . 4-32
4.4.3 SUMMARIZE Procedure . . . . . . . . . . . . . 4-34
iii
4.4.4 Sample SUMMARIZE Session . . . . . . . . . . . 4-39
4.5 TOPS-20 KLSTAT MODE . . . . . . . . . . . . . . 4-39
4.5.1 KLSTAT Procedure . . . . . . . . . . . . . . . 4-41
CHAPTER 5 ENTRY DESCRIPTIONS
5.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . 5-1
5.2 TOPS-10 ENTRIES . . . . . . . . . . . . . . . . . 5-2
5.2.1 System Reload . . . . . . . . . . . . . . . . . 5-2
5.2.2 Non-Reload Monitor Error . . . . . . . . . . . . 5-3
5.2.3 Crash Extract . . . . . . . . . . . . . . . . . 5-4
5.2.4 Data Channel Error . . . . . . . . . . . . . . . 5-7
5.2.5 DAEMON Started . . . . . . . . . . . . . . . . . 5-7
5.2.6 MASSBUS Disk Error . . . . . . . . . . . . . . . 5-8
5.2.7 DX20 Device Error . . . . . . . . . . . . . . . 5-9
5.2.8 Software Event . . . . . . . . . . . . . . . . 5-13
5.2.9 Configuration Status Change . . . . . . . . . 5-14
5.2.10 System Log Entry . . . . . . . . . . . . . . . 5-15
5.2.11 Software Requested Data . . . . . . . . . . . 5-15
5.2.12 Magtape System Error . . . . . . . . . . . . . 5-16
5.2.13 Front End Device Report . . . . . . . . . . . 5-18
5.2.14 Front End Reload . . . . . . . . . . . . . . . 5-18
5.2.15 KS10 Halt Status Block . . . . . . . . . . . . 5-19
5.2.16 Magtape Statistics . . . . . . . . . . . . . . 5-19
5.2.17 Disk Statistics . . . . . . . . . . . . . . . 5-20
5.2.18 DL10 Communications Error . . . . . . . . . . 5-22
5.2.19 KL10 Parity or NXM Interrupt . . . . . . . . . 5-22
5.2.20 KS10 NXM Trap . . . . . . . . . . . . . . . . 5-23
5.2.21 KL10 or KS10 Parity Trap . . . . . . . . . . . 5-24
5.2.22 Memory Sweep for NXM . . . . . . . . . . . . . 5-25
5.2.23 Memory Sweep for Parity . . . . . . . . . . . 5-26
5.2.24 CPU Status Block . . . . . . . . . . . . . . . 5-26
5.2.25 Device Status Block . . . . . . . . . . . . . 5-28
5.2.26 Line printer Error . . . . . . . . . . . . . . 5-29
5.2.27 Unit Record Error . . . . . . . . . . . . . . 5-30
5.3 TOPS-20 ENTRIES . . . . . . . . . . . . . . . . 5-30
5.3.1 TOPS-20 System Reloaded . . . . . . . . . . . 5-30
5.3.2 TOPS-20 BUGCHKs and BUGHLTs . . . . . . . . . 5-31
5.3.3 MASSBUS Device Error . . . . . . . . . . . . . 5-33
5.3.4 DX20 Device Error . . . . . . . . . . . . . . 5-39
5.3.5 Drive Statistics Entries . . . . . . . . . . . 5-41
5.3.6 Configuration Status Change . . . . . . . . . 5-43
5.3.7 System Log Entry . . . . . . . . . . . . . . . 5-44
5.3.8 Front-End Device Report . . . . . . . . . . . 5-44
5.3.9 Front End Reloaded . . . . . . . . . . . . . . 5-45
5.3.10 Processor Parity Trap . . . . . . . . . . . . 5-46
5.3.11 Processor Parity Interrupt . . . . . . . . . . 5-47
5.3.12 KL CPU Status Block . . . . . . . . . . . . . 5-48
5.3.13 MF20 Device Report . . . . . . . . . . . . . . 5-50
5.3.14 KLERR Front End Device Report . . . . . . . . 5-50
5.4 DECNET ENTRIES (V2.1) . . . . . . . . . . . . . 5-53
5.4.1 Network Control Started . . . . . . . . . . . 5-53
5.4.2 Network Up-Line Dump . . . . . . . . . . . . . 5-54
5.4.3 Network Down-Line Load . . . . . . . . . . . . 5-54
5.4.4 Network Hardware Error . . . . . . . . . . . . 5-55
5.4.5 Network CHECK11 Report . . . . . . . . . . . . 5-56
5.4.6 Network Line Statistics . . . . . . . . . . . 5-57
5.5 DECNET ENTRIES (V3.0) . . . . . . . . . . . . . 5-58
iv
APPENDIX A SPEAR MESSAGES
APPENDIX B INSTALLATION PROCEDURES
B.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . B-1
B.1.1 SPEAR Files . . . . . . . . . . . . . . . . . . B-1
B.1.2 Loading and Installing SPEAR . . . . . . . . . . B-1
APPENDIX C COMMAND AND CONTROL FILES
APPENDIX D EVENT CODES
APPENDIX E DISK SUBSYSTEM ERROR BITS
APPENDIX F NETWORK EVENT PARAMETERS
APPENDIX G GLOSSARY
INDEX
FIGURES
2-1 Components of a Computer System . . . . . . . . . 2-3
TABLES
4-1 Device Types . . . . . . . . . . . . . . . . . . . 4-7
4-2 Network Event Classes . . . . . . . . . . . . . . 4-7
4-3 Subprompts for Device Types . . . . . . . . . . 4-13
4-4 Error Types . . . . . . . . . . . . . . . . . . 4-13
4-5 MASSBUS Disk Registers . . . . . . . . . . . . . 4-32
4-6 Tape Registers . . . . . . . . . . . . . . . . . 4-33
4-7 Subprompts for Device Types . . . . . . . . . . 4-35
5-1 Network Event Classes . . . . . . . . . . . . . 5-58
A-1 User Validation Messages . . . . . . . . . . . . . A-1
A-2 Dialogue Usage Messages . . . . . . . . . . . . . A-3
A-3 Warning Messages . . . . . . . . . . . . . . . . . A-4
A-4 Event File Messages . . . . . . . . . . . . . . . A-5
D-1 TOPS-10 and TOPS-20 Event Codes . . . . . . . . . D-1
v
PREFACE
| This manual describes Version 2.0 of SPEAR on TOPS-10 and TOPS-20.
The primary audience for this manual is a person with experience in
the following areas:
1. Fault isolation techniques
2. KL10 instruction set
3. All hardware connected to the various configurations of
TOPS-10 or TOPS-20
If you do not have the above experience, refer to:
TOPS-10 Operators Guide
TOPS-20 Operators Guide
DECsystem-10/DECSYSTEM-20 Processor Reference Manual
DECsystem-10 Hardware Reference Manual
READING PATH
This manual has three functions: it serves as a learning aid, a
user's guide, and a reference tool for those who already have learned
to use the SPEAR library.
As a learning aid: Chapters 1, 2, and 3 provide an overview of the
SPEAR library. They also provide background information necessary to
understand and use the SPEAR library.
| As a user's guide: Chapter 4 provides step-by-step procedures for
| using the SPEAR functions RETRIEVE and SUMMARIZE. This chapter
| explains the command syntax and the response parameters associated
with each function.
| As a reference tool: Chapter 5 and the appendixes provide reference
material such as system event file formats, error messages, and a
glossary. This material is not meant to be read from beginning to
| end. Use Chapter 5 and the appendixes as a reference when you need
them.
vi
CONVENTIONS USED IN THIS MANUAL
The following conventions are used throughout this manual:
Contrasting colors Red - where examples contain both user
input and computer output, the
characters you type are in red; the
characters SPEAR prints are in black.
Lowercase letters Lowercase letters in a command string
indicate variable information you must
supply.
UPPERCASE LETTERS Uppercase letters in a command string
indicate fixed (literal) information
that you must enter as shown.
[ ] Square brackets indicate optional
information that you can omit from a
command string. Do not type the square
brackets.
Examples All examples were produced on either the
TOPS-10 or the TOPS-20 operating system.
This symbol represents where you press
the Escape key.
This symbol represents where you press
the RETURN key.
vii
Tab divider
BACKGROUND INFORMATION
CHAPTER 1
SPEAR OVERVIEW
1.1 INTRODUCTION
This Chapter introduces you to the SPEAR product and gives an overview
of its use.
The name SPEAR is an acronym for Standard Package for Error Analysis
and Reporting. The main function of SPEAR is to help isolate the
cause of a failure through information contained in the system event
file. Most failures are intermittent; that is, they are active at one
instant causing system malfunction and inactive at another instant
allowing system operation. The task at hand is to find the cause of
the failure and correct the problem in the least amount of time.
SPEAR helps to accomplish this task.
| SPEAR contains functions that report on the errors and events that are
recorded by the operating system, TOPS-10 or TOPS-20. In the past,
the field service engineer was forced to analyze intermittent failures
by sorting through error reports generated by SYSERR, looking for
common failure patterns. For example, the engineer examined several
disk reports looking for common media failures, common disk head
failures, or common failures of the read/write circuitry. Now, SPEAR
can do the tedious work.
1-1
SPEAR OVERVIEW
The system event file contains entries made by the operating system
and the communications subsystems (if any). Each time certain events
occur, the operating system records and stores pertinent data in the
system event file. The operating system continually monitors and
records information about every disk, tape, and memory parity error as
they occur, along with errors from other subsystems. At your
| discretion, you can call on SPEAR to generate a report of selected
events.
For more information on the system event file, refer to Chapter 2.
For samples of events your operating system can record, refer to
| Chapter 5.
| The SPEAR program consists of the following functions:
o RETRIEVE
o SUMMARIZE
|
| o KLSTAT (TOPS-20 only)
These function names are also the primary commands you type to run the
particular function of SPEAR in which you are interested.
RETRIEVE reads the binary data in the system event file and produces
an ASCII report for each entry selected. RETRIEVE also allows you to
save specific entries either for later analysis and translation or for
record-keeping purposes.
| SUMMARIZE reads the binary data in the system event file and produces
| an ASCII report. Refer to Section 4.4 for a description of SUMMARIZE.
| Chapter 4 describes these functions in detail, along with an
additional feature available only on TOPS-20, KLSTAT mode.
1-2
SPEAR OVERVIEW
1.2 USER PROFILES AND INTERACTION
There are three main groups of SPEAR users:
1. Field Service and Software Support personnel who have
specific maintenance responsibilities.
2. System operators who must recognize failures and initiate
recovery procedures.
3. System managers who have a need to monitor overall system
performance and schedule system use.
These groups each have varying degrees of expertise in software and
hardware areas. SPEAR can not only handle the needs of each group but
can also guide the new user as well as the experienced user.
The system operator and Field Service engineer can cooperate by using
SPEAR as a tool for both preventive and corrective maintenance.
1-3
2-1
CHAPTER 2
THE SYSTEM EVENT FILE
2.1 INTRODUCTION
This chapter discusses the file that SPEAR uses for input, the system
event file. Specifically, this chapter discusses what events are
recorded, how they are recorded, and what form they take within their
respective files.
Each operating system and communications subsystem has its own error
logging facility to gather and maintain information on system errors
and events as they occur. The error logging facility detects a
variety of hardware and software errors, providing a detailed record
of system activity. When an error occurs, the facility gathers
significant data about the current state of the system; the type of
data it gathers depends on the type of error detected. In addition to
detecting actual errors, the facility monitors events that reflect
other aspects of system performance. The recording of such events
helps to define the system context in which actual errors occur.
The events are recorded in a system event file, ERROR.SYS. The
logical name for the location of this file (structure and directory)
depends on which operating system you are using. The following list
gives you the names to use to locate your system event file:
|
| o TOPS-10 V7.02 SYS:ERROR.SYS
|
| o TOPS-20 V4.1 SYSTEM:ERROR.SYS
|
| o TOPS-20 V6.0 SERR:ERROR.SYS
Events that occur during the operation of the system are logged into
the system event file for use in preventive maintenance as well as
corrective maintenance. These events occur within the various
hardware and software components of the system, such as:
Hardware Software
CPU Operating system
Memory Memory management
I/O I/O
Console File system
Some of the events that can occur include parity errors, address
failures, operator log entries, system reloads, device mounts and
dismounts. Each time one of these events occurs, an entry is appended
to the system event file in binary format.
2-1
THE SYSTEM EVENT FILE
2.2 ENTRY CATEGORIES
There are two general categories of entries in the system event file,
error and nonerror. Both categories can be broken down further into
the following:
1. Software entries
2. Hardware entries
3. Performance entries
The following three sections describe the entry types that can be
found in the system event file.
2.2.1 Software Entries
The software error entries that SPEAR is concerned with are internal
software errors. On TOPS-10, these errors result in a STOPCD; on
TOPS-20, these errors result in a BUGHLT, BUGCHK, or BUGINF.
A STOPCD is represented by a 3-letter message that is printed at the
operator's terminal (CTY) when the operating system detects a serious
error. Sometimes the operating system crashes immediately following
this message; at other times the operating system continues to run but
halts the current job. The action the operating system takes depends
on the severity of the problem. There are five types of STOPCDs:
1. HALT - The system halts and you must manually dump and and
reload the operating system.
2. STOP - All jobs are aborted, and the system automatically
dumps and reloads itself.
3. CPU - This is the same as STOP except this message occurs
on dual processors. Jobs are aborted only on the
processor where the error occurs.
4. JOB - The current job is aborted and processing continues.
5. DEBUG - A message prints and processing continues.
The list of all stopcode messages is documented in the STOPCD
specification in the TOPS-10 Software Notebooks.
The TOPS-20 operating system errors also range in severity. A BUGHLT
is the most serious. It is a non-recoverable error detected by the
operating system. A BUGCHK is a recoverable error detected by the
operating system, while a BUGINF is a message informing you that a
certain event related to the operating system has occurred. BUGHLTs,
BUGCHKs, and BUGINFs are listed in the TOPS-20 Operators Guide.
2.2.2 Hardware Entries
The hardware entries come from a variety of subsystems; CPU, memory,
I/O, console, and networks. The number and type of components depends
on the system configuration. In general, Figure 2-1 represents the
2-2
THE SYSTEM EVENT FILE
major components or subsystems that can contribute entries to the
system event file.
Figure 2-1: Components of a Computer System
Hardware error entries are the most frequent type of error. These
errors are caused by a failure in the hardware itself. Each time an
event of this type occurs, an entry is made into the system event
file. Hardware error entries can be divided into three general
categories:
1. CPU-instruction and CPU-addressing failures
2. Controller and channel failures
3. I/O errors
Because the system hardware cannot be expected to operate continuously
without failure, the design of the hardware includes facilities to
monitor the hardware operation. (One such facility is the parity
check.) Once the system has detected an error, it can either signal
the CPU and system software that an error has occurred or attempt to
recover from the error and notify the software if it cannot recover
successfully. This activity is recorded in the form of one or more
entries in the system event file.
2.2.2.1 CPU and Memory Failures - The first category is a failure
occurring in the CPU and main storage section of the system. This
type of failure is perhaps the most difficult to handle correctly.
These failures can easily modify either the operating system software
or a user program or cause instructions to be incorrectly executed. A
failure in an addressing section can cause the system to operate with
incorrect data or unknowingly modify some other job's program or data.
For these reasons, CPU errors ordinarily cause the crash of a job or
the entire system, depending on whether a user or the operating system
is in control.
2-3
THE SYSTEM EVENT FILE
2.2.2.2 Channel and Controller Failures - The second category of
hardware error entry is a channel or controller failure. The system
controllers monitor and control several I/O devices of the same type,
and the channels of various types connect the CPU and/or main storage
units with the I/O controllers and devices. These errors are likely
to affect several jobs or users because each controller or channel can
handle several I/O devices being used by many jobs or processes.
Detected errors are signalled to the CPU, and the operating system may
stop the current operation if the error is serious. An example is a
controller's parity check of a command issued by the CPU. If this
parity check fails, the command will not be performed, and the error
will be signalled to the CPU. Such an event is recorded in the system
event file for subsequent retrieval by SPEAR.
2.2.2.3 I/O Device Failures - The third category of hardware error
entry is a failure of an I/O device. Errors detected by a single I/O
device are recovered in the same manner as channel and controller
failures but usually the error affects only one job or task. Some I/O
failures are caused by faulty media. The most frequently used form of
error recovery in this case is to retry the failing operation. If the
failure continues for a specified number of consecutive retries, the
job or task is crashed. Each failure is recorded in the system event
file.
2.2.3 Performance Entries
The system event file contains more than just error entries. It also
contains entries concerning day-to-day events of the system. These
events vary depending on the operating system. But in general, you
might find entries of the following nature:
1. System reloads
2. Tape and disk mounts/dismounts
3. Operator messages
These entries add another dimension to your environment. Keeping
track of system performance can be a useful tool in preventive
maintenance.
2.3 RECORDING EVENTS
The operating system continually detects and records events concerning
every disk, tape, and memory parity error as they occur. The
operating system:
1. Detects the event
2. Identifies the type of event
3. Associates it with a device
2-4
THE SYSTEM EVENT FILE
4. Gathers information about it
5. Records the date and time
6. Stores the information as an entry by appending it to the
system event file
7. In some cases, tries to recover or find a way around the
error
The system event file is a sequential file, therefore, each new entry
is written to the end of the file. SPEAR can format these entries
into an ASCII report with its RETRIEVE facility. Refer to Section 4.3
for information on RETRIEVE. The following section describes the
template that each entry fills.
2.3.1 Record Format
Each entry in a TOPS-10 and TOPS-20 system event file is composed of
two sections: a header section and a body section. The top section
(contained in asterisks) of each entry report is the header section.
It contains the following information:
1. The entry type
2. The time the entry was recorded
3. The operating system uptime at the time of the entry
4. The serial number of the CPU where the entry occurred
5. The record sequence number
The record sequence number is a number indicating the position of the
entry in the file. SPEAR assigns the record sequence number to the
entry when you decide to RETRIEVE it.
For each operating system, the format of the header is the same. The
following is a sample of an entry header on TOPS-20 after it has been
translated by SPEAR:
************************************************************
MASSBUS DEVICE ERROR
LOGGED ON FRI 13 JUN 80 03:23:15 MONITOR UPTIME WAS 2:34:08
DETECTED ON SYSTEM #2137.
RECORD SEQUENCE NUMBER: 344.
************************************************************
On TOPS-10, if the system crashed and the entry has been copied from
the CRASH.EXE file, the header states this fact at the top of the
section. For example:
***********************************************************
**THIS ENTRY COPIED FROM A SAVED CRASH**
.
.
.
.
***********************************************************
2-5
THE SYSTEM EVENT FILE
Because the information was extracted from a saved crash instead of a
running operating system, the date and time of the entry and the
uptime listed in the header are the last values recorded by the
operating system before it crashed. (Note that multiple entries
extracted from a crash will have identical DATE, TIME, and UPTIME.)
The body section of the entry contains the various data items that
make up the entry. The format of the header is constant regardless of
the entry type but the body varies according to the type of entry.
The amount of information that is reported in the body also varies
depending on the format you specify to RETRIEVE. You can receive a
SHORT version of an entry with only summary information or a FULL
entry with all the information that is in the system event file.
Refer to Section 4.3 for more information on the RETRIEVE function.
2.3.2 Record Conventions for Numbers and Dates
In the entries on TOPS-10 and TOPS-20, most numbers output by SPEAR
are either decimal or octal. If SPEAR uses another numbering system,
it is so noted on any report you request. Decimal values always
contain a decimal point; all other values are octal. Values printed
in half-word format have leading zeroes suppressed in each half of the
word, and the halves are separated with a comma.
All register values that are translated to text, such as the CONI
value, have text translations only for bits or bytes of interest, and
the whole value is dumped. For example, the CONI value might include
a DONE bit and a PI assignment, but these bits are not translated to
text.
All dates and times printed by SPEAR are from your local time zone,
for example EST, unless otherwise stated.
| Refer to Chapter 5 for samples of entries that can appear in the
system event file of your operating system.
2-6
CHAPTER 3
ISOLATING FAULTS
3.1 INTRODUCTION
The main reason for using SPEAR is to isolate the faults that are
causing intermittent failure of the system. In case you are unaware
of the various problems you can run into trying to find the cause of
these failures, this chapter discusses:
1. The types of failures that can occur and what causes them.
2. The various error-checking schemes built into the system.
3. Some techniques to follow in isolating these failures.
3.2 TYPES OF FAILURES
A fault is a condition that causes a system component to fail to
perform as expected. For example, such a condition could be a broken
wire, a power supply fluctuation, or an unexpected interaction between
two or more software routines. As a matter of course, the operating
system records the symptoms of these occurrences in the system event
file for later reference.
A fault is not necessarily noticeable until a failure occurs. A
failure occurs only when a fault causes an adverse effect on system
performance. The fault probably does not become apparent until a
failure occurs.
You are likely to find several faults before you find the one that is
causing the failure. Therefore, always confirm that the fault you
corrected is indeed the one that is causing the failure. Refer to
Section 3.4.1 for verification techniques.
You should also be on the lookout for changes in performance that may
indicate an impending failure. By running SPEAR daily and keeping a
record of its output, you could prevent a problem with the system.
There are two general categories of failures caused by faults. They
are:
o Solid failures
o Intermittent failures
3-1
ISOLATING FAULTS
3.2.1 Characteristics of Solid Failures
A fault that affects the system in a permanent manner results in a
solid failure. A solid failure is easier to solve than an
intermittent failure.
Because the failure is solid; that is, reproducible, you have a basis
by which to research, identify, and eliminate the cause of the
failure.
3.2.2 Characteristics of Intermittent Failures
A fault that affects the system in a temporary manner can result in an
intermittent failure. An intermittent failure is more difficult to
solve than a solid failure. Something must be causing the failure to
occur and something must be making it go away. The secret behind
finding the cause of an intermittent failure is knowing that somehow,
somewhere, something is changing the conditions under which the system
is running. The changing conditions, in turn, make the problem
intermittent.
For field service engineers: the next time you are working on a
really tough intermittent problem (after checking the power supplies
and ground system and running the appropriate diagnostics), try
stepping back and thinking about the problem. Think about what the
system is doing. Watch it for a while. See if you can identify the
exact conditions at the time of the failure. Use SPEAR to watch the
conditions of the system and check the events before and after they
occur by checking the system event file.
If you can identify the conditions, then maybe you can reproduce them.
If you can reproduce the conditions, then you have changed the
intermittent failure into a solid failure. Although the approach to
solving a solid failure is the same as the approach to solving an
intermittent failure, in many cases, you will find that solving a
solid failure is easier.
3.3 ERROR DETECTING AND ERROR CHECKING
The system has several means by which to check for errors in both the
hardware and software. The hardware contains error-detection
circuits, and the software contains error-checking routines. Both the
detection circuits and checking routines serve a dual purpose: (1) to
minimize the effects of a failure on overall system performance, (2)
to help isolate the cause of a failure.
3.3.1 Hardware Error Detectors
There are three basic types of hardware error detectors in common use:
1. Threshold error detectors
2. Timing error detectors
3-2
ISOLATING FAULTS
3. Parity error detectors
3-3
ISOLATING FAULTS
Threshold error detectors monitor critical analog circuits, such as
power supplies, servomechanisms, write current circuits, and
temperature probes.
Timing error detectors monitor asynchronous events within the system,
such as data requests to main memory or cache. The memory or cache
must respond to the request within a certain amount of time. If it
does not, the nonexistent-memory timing-error detector sets an error
condition. Other asynchronous events that must be monitored for
proper timing are: index and sector pulses, disk and tape up-to-speed
operations, and internal and external clocks.
Parity error detectors monitor the transfer of information. The
parity generator adds one or more extra bits to the information being
transferred to satisfy a particular parity algorithm. For example, in
the case of the single-bit odd parity, the information is in the form
of ones and zeros, the extra parity bit assures that the total number
of one bits in the transfer is odd. The parity error detector
monitors each transfer. Should a transfer ever contain an even number
of one bits, the parity error detector raises a parity error
condition. Note that in some cases, two bits can be dropped leaving
odd parity. However, this is an undetectable error condition.
Once any one of these detectors detects an error condition, the
operating system records the information as an entry in the system
event file. These are the kinds of events you will be looking for
when using the SPEAR library.
3.3.2 Software Error Checking
There are four types of software error checking routines in common
use:
1. Range checking
2. Validity checking
3. Sum checking (checksum)
4. Loop checking
A range checking routine verifies that the arguments supplied to a
routine fall between two known values.
A validity checking routine verifies that a routine written to accept
only certain arguments indeed accepts only those arguments. Any other
response causes an error condition.
A sum checking routine (checksum) checks file storage. When the
monitor assembles a group of blocks to write contiguously on the disk,
it checksums the first word of that group and saves that checksum in
the retrieval information block (RIB). If, when read back, that
checksum does not match the first word; the monitor assumes it read
the wrong block. If there are no hardware errors, this is the best
assumption. These errors probably indicate a disk addressing failure.
3-4
ISOLATING FAULTS
If the monitor crashes before it is able to write the new RIB of an
old file, the checksum may change in core but not on disk. An obscure
software problem may also be responsible. Reproducing the error is
one way for you to narrow the problem down. Also check the crash log
and look for other error types.
Note that a checksum error is not a substitute for parity. Its
purpose is to make sure that a data set was written in the right
place. If it was not, either the software failed to keep track of the
data, or the hardware failed to address the correct place.
A loop checking routine keeps count of the number of times a program
entered a loop and reports an error when a maximum count is reached,
indicating that the loop is unable to reach a decision.
Any time one of these error conditions is set, the operating system
records the event in the system event file. You can check on these
events by using the SPEAR library.
3.4 ISOLATION TECHNIQUES
When you are faced with the problem of finding the cause of an
intermittent failure, you should take the time to define the problem.
First check the symptoms:
1. What is happening that should?
2. What is happening that should not?
3. What are the conditions and circumstances?
As you probably know, here are some possible causes of intermittent
failures:
1. An environmental violation (power, grounding, temperature,
humidity, contamination)
2. A damaged, defective, or worn component
3. A faulty mechanical or electrical connection
4. A mechanical misalignment
5. An electrical misadjustment
6. A software design oversight
7. A hardware design oversight
What you have to work with are the symptoms of the failure and the
SPEAR library of functions. Hopefully, the system operator has been
running SPEAR analysis on a daily basis so that you can get a picture
of the conditions leading up to the problem. If not, you can run
SPEAR and receive a report within a short period of time. With SPEAR
analysis and reported symptoms, you should be able to venture a guess
3-5
ISOLATING FAULTS
as to the cause of the problem. You might even be able to pinpoint
the failure right away. If you are not that fortunate, your next plan
of action is to do the following:
1. Devise an experiment
2. Predict the results
3. Conduct the experiment
4. Evaluate the results
5. Refine the experiment
6. Repeat the process
For example, if you suspect that a disk pack is bad, move the pack to
another disk drive. If the media is bad, the error pattern will move
to the other drive. Once you believe you have isolated the failure,
you should confirm your findings. After moving the disk pack, run the
system for a couple of days. Check to see if the same error patterns
occur on the second drive.
3.4.1 Verification
There are two general methods of verifying your findings. The first
method is to reinsert the problem. If the symptoms recur, you can be
relatively sure that you have identified the cause of the problem,
thereby verifying your findings. If the symptoms do not recur, you
should proceed with the second method.
The second method is called the time window. You should use the time
window for intermittent problems or when reinserting the probable
cause is not feasible; that is, when reinserting would be too time
consuming or potentially damaging to the system.
The time window is simply a period of time during which you closely
monitor the performance of the system. If the problem does not recur
during that period, then you assume the problem is solved, and your
findings are verified.
The duration of the time window depends on whether the problem was
solid or intermittent. If the problem was solid, then monitor the
system for 24 hours. If the problem was intermittent, wait at least
three times as long as the frequency of the error. Experience will
dictate the method that works best for you.
Your site may have its own specific isolation and verification
techniques that are tried and true. If so, stay with the most
successful method.
ISOLATING FAULTS
Tab divider
SPEAR LIBRARY
3-7
4-1
CHAPTER 4
SPEAR FUNCTIONS
4.1 INTRODUCTION
The previous chapters introduced you to SPEAR, described where SPEAR
gets its information, and listed techniques for intermittent fault
isolation. This chapter explains how to use the SPEAR dialogue with
| its help facilities and describes the functions in the SPEAR library:
1. RETRIEVE
2. SUMMARIZE
|
| 3. KLSTAT (TOPS-20 only)
SPEAR is set up in such a way that after you use it a number of times
you can run through it without any problems. The reason for its ease
of use is the way you interact with SPEAR. SPEAR has a dialogue that
prompts and helps you along as much as you want.
4.2 RUNNING SPEAR
To run SPEAR, first log in to your operating system, then type one of
the following:
.R SPEAR On TOPS-10 based systems
@SPEAR On TOPS-20 based systems
SPEAR indicates that it is waiting for instructions by displaying the
following prompt:
SPEAR>
After you see the SPEAR prompt, you can type any one of the function
names, (you can type KLSTAT on TOPS-20 only) or type HELP or question
mark, or EXIT back to operating system command level. If you type a
function name, you need only specify enough characters to make it
unique to SPEAR. In this case, you need type only the first character
of the name for SPEAR to recognize it.
If you type a question mark (?) at this point, SPEAR prints a list of
the features available to you in your version of the SPEAR Library.
4-1
SPEAR FUNCTIONS
CAUTION
The SPEAR library is not transportable across
operating systems. You cannot run SPEAR for TOPS-10
on TOPS-20 and so on. Consequently, you cannot use
the system event file from one operating system with a
SPEAR library from another system.
SPEAR has several features to guide you in its use. The following
subsections describe these features.
4.2.1 Prompts, Responses, and Arguments
Each function of SPEAR has several levels of questions for you to
answer. SPEAR prompts you and gives you a selection of acceptable
responses. The default is listed in parentheses with each prompt.
If you have been through this before, you can speed up the process by
responding to all the prompts on the first line, using legal
separators, or by specifying an indirect file containing your
responses.
SPEAR can process commands from a disk file as well as from your
terminal. This disk file, known as an indirect file, is useful if you
have a set of responses you often use. To use this function, create a
disk file while at operating system command level with a text editor.
The file should contain responses that you would normally type to
SPEAR on the terminal.
NOTE
Be sure to delete any line-sequence numbers from your
indirect file. SPEAR will not accept them.
Once you have created the file and saved it in your disk area, all you
need to do is to run SPEAR and type the file name preceded by an at
sign (@). The at sign (@) signifies an indirect file. The default
file name for an indirect file is SPRCMD.CMD. Note that you can
specify an indirect file at any prompt level of SPEAR, as long as the
file contains only the remaining information necessary to complete the
SPEAR requests.
You can choose to be prompted at every step or decide to supply all
required information without prompting. In fact, at SPEAR command
level, you can input an entire SPEAR session on one line, separating
each field with a space. For example:
SPEAR>RETRIEVE A0916.PAK 5,6,10 ASCII FULL /G<RET>
By using special characters as separators, you can also speed up the
process within the SPEAR dialogue. Section 4.2.2 describes these
characters.
4.2.2 Separators and Terminators
The following characters and terminal keys have special meaning to
SPEAR:
4-2
SPEAR FUNCTIONS
1. The RETURN key <RET> - indicates that you have completed
input to a SPEAR prompt in one way or another. You have
either input your own arguments or taken the default.
2. A comma (,) - indicates that you are inputting a list of
items within one request for input, for example a list of
sequence numbers or packet identifiers.
3. A colon (:) - indicates that you have either input a device
name within a file specification or you have specified
devices within an error type specification.
4. A plus sign (+) - separates more than one major error type on
one line.
5. A semicolon (;) - indicates that the next argument is a
version number in a file specification.
6. An exclamation point (!) - allows you to insert comments.
SPEAR ignores anything it sees on the current line after an
exclamation point.
4.2.3 Help Features
There are five major help features in SPEAR, the question mark (?),
the HELP command, the @HELP command, the question mark switch (/?),
and the /HELP switch.
1. The question mark (?) provides enough information to refresh
your memory about the acceptable responses.
2. The HELP command provides detailed information on both the
prompt and on acceptable commands.
3. The @HELP command displays information concerning indirect
files.
4. The question mark switch (/?) provides a list of switches you
can type as response to a particular prompt.
5. The /HELP switch provides an explanation of the acceptable
switches that you can type as response to a particular
prompt.
You can type any of these help features after any prompt in the SPEAR
dialogue and also after you have typed a response to the prompt. For
example, if you type a question mark in response to a prompt, SPEAR
does the following:
1. Lists all acceptable responses.
2. Gives a brief description of the desired response if it is
general (for example, file specification).
If you type a question mark after supplying characters to a prompt,
SPEAR lists all acceptable responses matching the characters typed.
You can also type the HELP command after any prompt. SPEAR prints up
to 22 lines of information about the use of the prompt.
4-3
SPEAR FUNCTIONS
The Escape key is another help feature in the SPEAR library. The
Escape key fills in a response if you type enough characters for SPEAR
to know what you want. For example:
Output mode (ASCII):B<ESC>INARY
If you do not supply enough information before typing <ESC>, SPEAR
prompts you for more input by sending a bell to the terminal. If you
press <ESC> without typing any characters in response to a prompt,
SPEAR fills in the default response. For example:
Event file (SERR:ERROR.SYS):<ESC>SERR:ERROR.SYS
The following keys can also help you through the SPEAR dialogue:
1. CTRL/U - deletes the current input line
2. CTRL/W - deletes back to the last punctuation character
3. CTRL/F - completes the next field of a file specification
with the default
4.2.4 File Specifications
The following are the formats of the file specifications that can be
given in a SPEAR command string. These formats are listed according
to operating system:
TOPS-10 dev:filename.file extension[directory]
TOPS-20 dev:<directory>filename.file type.file version
4.2.5 SPEAR Switches
The following is a list of the switches available in SPEAR. Note that
the square brackets indicate optional information that you can omit.
You do not type the square brackets.
/? lists the available switches.
/B[REAK] returns you to the SPEAR> prompt.
/G[O] executes the current SPEAR command with the
parameters you have given so far. It takes the
defaults for the rest of the parameters. This is
the default switch.
/H[ELP] lists the available switches and gives a brief
explanation of their uses.
4-4
SPEAR FUNCTIONS
/R[EVERSE] returns you one level back to the previous prompt,
where you can change any parameters.
/S[HOW] shows all the parameters you have specified so far
and fills in the defaults for the ones you have
not specified.
The following is an example (from TOPS-10) using the /SHOW switch with
the RETRIEVE and SUMMARIZE commands. Note that all the defaults are
shown because no other parameters have been specified.
SPEAR> SUMMARIZE/SHOW
Event file: SYS:ERROR.SYS
Report to: DSK:SUMMAR.RPT
| Time from: 8-Mar-83
Time to: LATEST
| Show Error Distribution: YES
SPEAR> RETRIEVE/SHOW
Event or packet file: SYS:ERROR.SYS
Output to: DSK:RETRIE.RPT
Merge with: NONE
Time from: EARLIEST
Time to: LATEST
Selection to be: INCLUDED
Output mode: ASCII
Report format: SHORT
Selection type: ALL
SPEAR> RETRIEVE/REVERSE
SPEAR> EXIT
.
4.2.6 Exiting from SPEAR
To exit from SPEAR, first return to the SPEAR> prompt by typing
/BREAK. Then type the EXIT command. You can also exit from SPEAR by
typing CONTROL/C at any prompt.
4.3 RETRIEVE
RETRIEVE provides a means by which to convert the entries in the
system event file from internal binary format to a readable ASCII
format. It also allows you to select specific entries from the system
event file and save them in a separate file.
4-5
SPEAR FUNCTIONS
4.3.1 RETRIEVE Input
RETRIEVE accepts the following types of input:
1. The system event file
2. A file created by the RETRIEVE process
3. Any file containing entries from the system event file
With RETRIEVE, you have the option of translating the entire system
event file or specific entries in the file by sequence number. In
order to have more control over the selection of specific types of
| entries, you can use RETRIEVE to extract the entry types in which you
| are interested and then translate them.
| You can select entries on the basis of the following:
1. Date/time limits
2. Sequence numbers
3. Event codes
|
| 4. Error
|
| 5. Statistics
|
| 6. Configuration
|
| 7. Diagnostics
|
| Error, Statistics, Configuration and Diagnostics can be further
| subdivided into the following categories:
1. Mainframe (CPU, memory, front-end)
|
| 2. Disk
|
| 3. Tape
|
| 4. CI
|
| 5. NI
|
| 6. Unit record
|
| 7. Network
|
| 8. Operating system
9. Disk pack identifier
10. Tape reel identifier
| Once you have defined a category, you can specify physical names or
| device types within a class, such as LPT for unit record device.
Table 4-1 lists the available device types that you can specify:
4-6
SPEAR FUNCTIONS
| Table 4-1: Device Types
|
|
Category Device Types
Mainframe ALL, MEM, FE, CPU
Disk ALL, RM03, RM05, RP04, RP05, RP06, RP07,
| RS04, RP20, RA60, RA80, RA81
| Tape ALL, TU16, TU45, TU70, TU71, TU72, TU73,
| TU77, TU78, TA78
|
| CI CI20, HSC50
Unit Record ALL, LPT, CDR
Network ALL, Decimal number in range 0-511 (see Table
4-2)
| Table 4-2 lists the classes available for selection of DECnet events:
|
|
| Table 4-2: Network Event Classes
|
|
Class Description
0 Management layer
1 Application layer
2 Session Control layer
3 Network services layer
4 Transport layer
5 Data link layer
6 Physical link layer
007-031 Reserved for other common event classes
032-063 Reserved for RSTS specific event classes
064-095 Reserved for RSX specific event classes
096-127 Reserved for TOPS-20 specific event classes
128-159 Reserved for VMS specific event classes
160-191 Reserved for RT specific event classes
192-479 Reserved for future use
480-511 Reserved for Customer specific event classes
| For more information concerning network entries from DECnet V3.0,
| refer to the DECnet documentation for system managers and operators.
|
| If you specify Error as an entry selection, you can also specify an
| error type. See Table 4-4 for a list of error types.
4.3.2 RETRIEVE Output
RETRIEVE output can be in the following forms:
4-7
SPEAR FUNCTIONS
1. One or two lines containing the most pertinent data in ASCII
format.
2. All data about each event, in ASCII format.
4-8
SPEAR FUNCTIONS
3. All data about each event in octal dump format. This format
is useful only for debugging the error-reporting system.
4. Specific events saved in binary format, for future reference.
Your default output can be an ASCII file, RETRIE.RPT, or a binary
file, RETRIE.SYS.
You should be aware that user-defined entries that are unknown to
SPEAR cannot be translated into ASCII. You can, however, get an octal
dump of these entries by specifying OCTAL to the Output Mode prompt
when running RETRIEVE.
An unusual event you may find in the system event file is a KLERR
entry. The KLERR entries are different from most entries in that it
takes several event file records to make up one complete entry. This
is because the front-end must send information in pieces through the
DTE interface along with all communications, console, and hard-copy
data. Because of this, there is a chance that not all records will
actually get through to the event file. When SPEAR sees that a KLERR
entry is incomplete, it will type an error message (non-fatal) and
will translate all available data anyway.
Each KLERR entry uses one sequence number. When looking at a RETRIEVE
report, you may notice gaps between sequence numbers even if you have
selected ALL entries. A KLERR entry is listed using the sequence
number of the first record in the entry, but it is not listed until
all records of the entry have been received. Because other entries
may enter the event file before the front-end has sent all records of
one KLERR entry, the KLERR entry will appear to be out of sequence.
For example, you may find entries with the following sequence numbers:
1. Configuration status change
3. Disk error
6. Tape error
2. KLERR
8. Reload
For step-by-step procedures for using RETRIEVE, refer to Section
4.3.3.
4.3.3 RETRIEVE Procedure
RETRIEVE allows you the option of converting events in the system
event file into an ASCII format for listing on the terminal or
lineprinter. To begin with, RETRIEVE prompts with one or more of the
following guidewords:
4-9
SPEAR FUNCTIONS
RETRIEVE Mode
_____________
Event or packet file(SERR:ERROR.SYS):
Selection to be (INCLUDED):
Selection type (ALL):
Sequence numbers:
Event codes:
| Category (ALL):
|
| Next category (FINISHED):
Mainframe devices (ALL):
Disk drives (ALL):
Tape drives (ALL):
| CI controller (ALL):
Unit record devices (ALL):
| Disk (structure IDs):
|
| Tape (reel IDs):
Time from (EARLIEST):
Time to (LATEST):
Output mode (ASCII):
Merge with (NONE):
Report format (SHORT):
Output to (DSK:RETRIE.RPT):
4.3.3.1 Retrieving Selected Events - If you want to take all the
defaults, type R/G to the SPEAR> prompt; otherwise, read the following
procedure.
STEP 1
| After typing RETRIEVE to the SPEAR> prompt, RETRIEVE asks for the name
| of the input file:
Event or packet file (SERR:ERROR.SYS): TOPS-20
or
Event or packet file (SYS:ERROR.SYS): TOPS-10
4-10
SPEAR FUNCTIONS
Type one of the following:
1. The RETURN key - to select the default, the system event
file.
2. Any file name, in the proper format, containing events stored
in binary.
3. The name of a previous file that you RETRIEVEd in BINARY
mode.
STEP 2
RETRIEVE then prompts for the method of selection:
Selection to be (INCLUDED):
Type one of the following:
1. The RETURN key - to select the default I[NCLUDED]. INCLUDED
moves a few selected entries of various types into a separate
file.
2. E[XCLUDED] - to select all but a few entry types.
STEP 3
After selecting INCLUDED or EXCLUDED, you receive the following
prompt:
Selection type (ALL):
| At this prompt, you have two separate lists from which to choose.
| Type one or more of the following from the first group:
|
| 1. E[RROR] - to select entries that contain actual failure data.
|
| 2. ST[ATISTICS] - to select statistic entries.
|
| 3. D[IAGNOSTICS] - to select entries created by a diagnostic.
|
| 4. CON[FIGURATION] - to select configuration entries.
|
| 5. O[THER] - to select entries that do not fit into the other
| types.
|
| If you choose more than one of these types, separate each with a
| comma.
|
| Or type one of the following from the second group:
|
| 1. The RETURN key or A[LL] - to select the default that extracts
| all entries. You will be asked for date and time limits
| next.
|
| 2. SE[QUENCE] - to select entries by sequence number.
|
| If you choose SEQUENCE, RETRIEVE prompts further with:
|
| Sequence numbers:
|
| Here you can specify one number, several numbers separated by
4-11
SPEAR FUNCTIONS
| commas, or a range of numbers separated by a hyphen.
|
| 3. COD[E] - to select entries on the basis of their octal code
| number. These numbers are listed in Table D-1 and in the
| SPEAR Reference card.
|
| If you choose CODE, RETRIEVE prompts you further with:
|
| Event codes:
|
| Here you can specify one number, several numbers separated by
| commas, or a range of numbers separated by a hyphen.
|
| If you chose ERROR, STATISTICS, CONFIGURATION, OTHER, or a combination
| of these, proceed with Step 3A. If you chose ALL or CODE, proceed to
| Step 4. If you chose SEQUENCE proceed to Step 6.
|
| STEP 3A
|
| If you choose ERROR, STATISTICS, CONFIGURATION, OTHER, or a
| combination of these types, you receive the following prompt:
|
| Category (ALL):
|
| Type one of the following:
|
| 1. The RETURN key or A[LL] - to select all the categories. This
| is the default.
|
| 2. M[AINFRAME] - to select errors occurring in specific
| mainframe components.
|
| 3. D[ISK] - to select entries occurring on disk subsystems or
| individual drives.
|
| 4. T[APE] - to select entries occurring on tape subsystems or
| individual drives.
|
| 5. CI - to select entries occurring on the CI interconnect or
| the HSC50 disk controller.
|
| 6. NI - to select entries occurring on the NI.
|
| 7. U[NITRECORD] - to select entries occurring on unit-record
| devices such as card readers and line printers.
|
| 8. NE[TWORK] - to select entries occurring on the network nodes.
|
| 9. O[PERATING-SYSTEM] - to select entries that are software
| related.
|
| 10. CO[MM] - to select entries occurring on communications
| devices.
|
| 11. P[ACKID] - to select entries occurring on specific disk
| packs.
|
| 12. R[EELID] - to select entries occurring on specific tape
| reels.
|
| All categories except COMM and NI, prompt further for specific device
| types. Table 4-3 lists the subprompts you can expect:
4-12
SPEAR FUNCTIONS
| Table 4-3: Subprompts for Device Types
|
|
| Device Type Subprompt
|
| MAINFRAME Mainframe devices (ALL):
| DISK Disk drives (ALL):
| TAPE Tape dives (ALL):
| CI CI controllers (ALL):
| UNITRECORD Unit record devices (ALL):
| NETWORK Event class and type (ALL):
| OPERATING-SYSTEM Operating System codes (ALL):
| PACKID Disk (structure IDs):
| REELID Tape (reel IDs):
|
|
| Type ? at the subprompt level to get a list of acceptable responses,
| or refer to Table 4-1 in this manual.
|
| If you chose ERROR as one of the selection types in STEP 3, you can
| also specify the particular error types for which you are looking in
| relation to the specific device. Table 4-4 lists the error types for
| the devices:
|
|
| Table 4-4: Error Types
|
|
| Prompts Error Types
|
| Disk error type (ALL): OFFLINE
| WRITE-LOCK
| UNSAFE
| MICROPROCESSOR
| SOFTWARE
| BUS
| CHANNEL-CONTROLLER
| READ-WRITE
| SEEK-SEARCH
| TIMING
| OTHER
|
| Tape error type (ALL): READ
| WRITE
| DEVICE-FORMATTER
| BUS
| CHANNEL-CONTROLLER
| SOFTWARE
| OFFLINE
| OPERATOR
| OTHER
|
| CI error type (ALL):
| for CI20 EBUS
| MBUS
| CRAM-PARITY
| CHANNEL-ERROR
| SERDES-OVERRUN
| EDS
| INCONSISTENT-DATA
4-13
SPEAR FUNCTIONS
| CI error type (ALL):
| for HSC50 SERDES-OVERRUN
| EDC
| INCONSISTENT-DATA
|
| NI error type (ALL): EBUS
| MBUS
| CRAM-PARITY
| CHANNEL-ERROR
|
|
STEP 3B
| RETRIEVE keeps prompting you for categories until you either type
FINISHED or press the RETURN key:
| Next category (FINISHED):
Type one of the following:
|
| 1. The RETURN key or F[INISHED] to take the default.
|
| 2. Another category.
Note that you can select disk entries by either DISK or PACKID and
| tape entries by either TAPE or REELID. If you are interested in
media, use PACKID or REELID; otherwise, use DISK or TAPE. If you
specify both DISK and PACKID (or TAPE and REELID), you select all disk
entries (or tape entries), not just those that match the selected
media. If you want to select entries with a specific device and
media, you must run RETRIEVE twice.
4-14
SPEAR FUNCTIONS
| You can specify more than one device name by separating them with
commas. For example:
| Disk drives (ALL):DISK:RP06,RM03,RP05
You can always come back to error category selection (by using
/REVERSE) to add parameters. Everything typed here remains until you
| type CTRL/U or CTRL/W.
| Note that supplying a device type (RP06, RM03) causes SPEAR to search
a different field than if you had supplied a physical name (DP130,
MTA1, and so forth). If the name you supply does not match one of the
known device types, SPEAR assumes that it is a physical name.
STEP 4
RETRIEVE then prompts you for the date and time limits of the entries
you want to select:
Time from (EARLIEST):
Type one of the following:
1. The RETURN key or E[ARLIEST] - to select the beginning of the
file. This is the default.
2. A date and time in the format dd-mmm-yy hh:mm:ss - to signify
where to begin extracting entries. A date by itself defaults
to one second after midnight.
3. A date and time in the format -nn to indicate a reference
point prior to the current date. For example, -7 causes
RETRIEVE to begin extracting entries from seven days prior to
the current day.
STEP 5
RETRIEVE then prompts for the end of the time period:
Time to (LATEST):
Type one of the following:
1. The RETURN key or L[ATEST] - to select the end of the file.
This is the default.
2. A date and time in the format dd-mmm-yy hh:mm:ss - to
indicate the last date for extracted entries. A date by
itself defaults to one second after midnight.
3. A date and time in the format -nn to indicate a reference
point prior to the current date. For example, -13 causes
RETRIEVE to stop extracting entries recorded thirteen days
before the current date.
4-15
SPEAR FUNCTIONS
STEP 6
RETRIEVE next prompts for style of output:
Output mode (ASCII):
Type one of the following:
1. The RETURN key or A[SCII] - to convert entries into ASCII
format. This is the default.
2. B[INARY] - to retain the entries in their internal format.
If you choose ASCII, proceed to STEP 7. If you choose BINARY, skip to
STEP 8.
STEP 7
After choosing ASCII, RETRIEVE prompts you for the form of your
output:
Report format (SHORT):
Type one of the following:
1. The RETURN key or S[HORT] - to select the default. This
selection produces a report with only the most essential
information. No entry will be longer than three lines of 72
columns.
2. F[ULL] - to display all the information that the operating
system recorded for that entry.
3. O[CTAL] - to produce a ones and zeros ASCII report. The ones
and zeros represent the actual binary contents of the entry.
Unless you are familiar with the internal format of the
individual entries, this format has very little value. Its
primary purpose is to aid in debugging the SPEAR program
library.
STEP 8
If you specified BINARY as output style, RETRIEVE then prompts for
another file name to give you an opportunity to combine two files into
one for record-keeping purposes. The merged output file will be in
the proper chronological order. Both files must be in binary format.
The prompt is:
Merge with (NONE):
Type one of the following:
1. The RETURN key - to select the default of NONE.
2. A file name of another file containing entries from the
system event file.
4-16
SPEAR FUNCTIONS
STEP 9
The last thing RETRIEVE asks for is the destination of the output. If
you chose ASCII, the prompt is:
Output to (DSK:RETRIE.RPT):
If you chose BINARY, the prompt is:
Output to (DSK:RETRIE.SYS):
Type one of the following:
1. The RETURN key - to select the default RETRIE.RPT or
RETRIE.SYS.
2. TTY: - to direct ASCII formatted output to the terminal.
You should not request BINARY formatted output to be printed
on the terminal.
3. Any file name in the proper format for your system.
After you select the output destination and press RETURN, SPEAR asks
you to confirm your decision:
| Type <cr> to confirm (/GO):
| At this point, you can:
1. Press RETURN or type /GO to execute the RETRIEVE process.
2. Type /SHOW to list the parameters you have chosen.
3. Type /REVERSE to return to the previous prompt.
4. Type /BREAK to return to SPEAR> level.
5. Type question mark (?), HELP, the question mark switch (/?),
or /HELP to find out what your options are.
If your output is formatted in ASCII and you decide to output the file
to your disk area, you can list the file on the lineprinter by doing
the following:
Return to operating system command level by typing EXIT to the
SPEAR> prompt.
Use the PRINT command with any options available on your
operating system.
4-17
SPEAR FUNCTIONS
4.3.3.2 Sample RETRIEVE Session - The following is a sample RETRIEVE
session using the TOPS-20 system event file for input:
| @spear
|
| Welcome to SPEAR for TOPS-20. Version 2(605)
| Type "?" for help.
|
|
| SPEAR> retrieve
|
| RETRIEVE mode
| -------------
| Event or packet file (SERR:ERROR.SYS):
|
| Selection to be (INCLUDED):
|
| Selection type (ALL): error,diagnostic
|
| Category (ALL): disk
|
| Disk drives (ALL): RP07
|
| Disk error type (ALL): ?
|
| One or more of the following:
| ALL
| OFFLINE
| WRITE-LOCK
| UNSAFE
| MICROPROCESSOR
| SOFTWARE
| BUS
| CHANNEL-CONTROLLER
| READ-WRITE
| SEEK-SEARCH
| TIMING
| OTHER
| HELP
|
| Disk error type (ALL): read-write
|
| Next Category (FINISHED):
|
| Time from (EARLIEST):
|
| Time to (LATEST):
|
| Output mode (ASCII):
|
| Report format (SHORT): full
|
| Output to (DSK:RETRIE.RPT):
|
| Type <cr> to confirm (/GO):
4.3.3.3 Short Format - The following is a sample of a RETRIEVE report
in short format:
4-18
SPEAR FUNCTIONS
| @ty retrie.RPT
|
| SPEAR Version 2(565). Retrieval from SERR:ERROR.SYS
| Report generated 6-Mar-84 15:57:46-EST
| As directed by user
| Selected window: 23-Feb-84 00:00:01-EST to 26-Feb-84 00:00:01-EST.
| Selected records are included
| Selection type is ERRORS,
| Report sent to DSK:RETRIE.RPT
|
|
| SEQ TIME Thu 23 Feb 84
|
| 1249. 03:12:43 DP100 WORK: RP07 SERIAL #2861. CONI RH= 0,222715
| CHN STS= 540100,174632 SR= 0,51700 ER= 0,100000
| CYL/SURF/SEC= 212./27./3.
| 1713. 08:15:49 DP040 RP06 SERIAL #0125. CONI RH= 0,202615
| CHN STS= 500000,305600 SR= 0,51700 ER= 0,100000
| CYL/SURF/SEC= 0./0./1.
| 1875. 11:26:39 DP000 SERR: RP06 SERIAL #0941. CONI RH= 0,222615
| CHN STS= 540100,174024 SR= 0,51700 ER= 0,100000
| CYL/SURF/SEC= 603./10./16.
|
| SEQ TIME Fri 24 Feb 84
|
| 328. 13:14:20 DP010 PUBLIC: RP06 SERIAL #0484. CONI RH= 0,222615
| CHN STS= 540100,174066 SR= 0,51700 ER= 0,100000
| CYL/SURF/SEC= 93./12./0.
| 372. 17:04:09 DP000 SERR: RP06 SERIAL #0941. CONI RH= 0,222615
| CHN STS= 540100,174024 SR= 0,51700 ER= 0,100000
| CYL/SURF/SEC= 361./15./16.
|
| SEQ TIME Sat 25 Feb 84
|
| 85. 10:43:36 DP110 GALAXY: RP07 SERIAL #251D. CONI RH= 0,322615
| CHN STS= 540100,174632 SR= 0,51700 ER= 0,400
| CYL/SURF/SEC= 623./15./35.
|
|
|
| 4.3.3.4 Octal Format - The following is a sample of a RETRIEVE report
| in octal format.
|
|
| SPEAR Version 2(565). Retrieval from SERR:ERROR.SYS
| Report generated 6-Mar-84 16:08:12-EST
| As directed by user
| Selected window: 23-Feb-84 00:00:01-EST to 26-Feb-84 00:00:01-EST.
| Selected records are included
| Selection type is ERRORS,
| Report sent to DSK:RETRIE.OCTAL
|
|
|
| Sequence # 1249 -- Record HEADER:
| 0/ 111001,,125124
| 1/ 131271,,257140
| 2/ 0,,116617
| 3/ 0,,5467
| 4/ 0,,2341
|
4-19
SPEAR FUNCTIONS
| Record BODY:
| 0/ 0,,0
| 1/ 675762,,530000
| 2/ 1242,,440147
| 3/ 1,,74014
| 4/ 100000,,1
| 5/ 0,,222715
| 6/ 0,,2415
| 7/ 0,,35624
| 10/ 1,,234156
| 11/ 0,,172464
| 12/ 0,,0
| 13/ 0,,0
| 14/ 0,,0
| 15/ 732200,,177471
| 16/ 732200,,177471
| 17/ 720000,,15403
| 20/ 720000,,15403
| 21/ 0,,715652
| 22/ 600001,,0
| 23/ 0,,1
| 24/ 0,,0
| 25/ 0,,0
| 26/ 0,,0
| 27/ 0,,324
| 30/ 0,,2214
|
| .
| .
| .
|
| Sequence # 1713 -- Record HEADER:
| 0/ 111001,,125124
| 1/ 131271,,432751
| 2/ 0,,272430
| 3/ 0,,5467
| 4/ 0,,3261
|
| Record BODY:
| 0/ 0,,0
| 1/ 0,,0
| 2/ 1242,,440146
| 3/ 0,,1
| 4/ 100000,,1
| 5/ 0,,202615
| 6/ 0,,2415
| 7/ 0,,0
| 10/ 0,,466
| 11/ 0,,0
| 12/ 0,,0
| 13/ 0,,0
| 14/ 0,,0
| 15/ 732204,,177771
| 16/ 732204,,177771
| 17/ 720004,,1
| 20/ 720004,,1
| 21/ 0,,715436
| 22/ 200001,,0
| 23/ 0,,1
| 24/ 0,,0
| 25/ 0,,0
4-20
SPEAR FUNCTIONS
| 26/ 0,,0
| 27/ 0,,0
| 30/ 0,,1
|
| .
| .
| .
| 4.3.3.5 Full Format - The following is an example of a full format:
4-21
SPEAR FUNCTIONS
RETRIEVE SESSION
|
| SPEAR Version 2(565). Retrieval from SERR:ERROR.SYS
| Report generated 6-Mar-84 16:02:31-EST
| As directed by user
| Selected window: 23-Feb-84 00:00:01-EST to 26-Feb-84 00:00:01-EST.
| Selected records are included
| Selection type is ERRORS,
| Report sent to DSK:RETRIE.FULL
|
|
|
|
| ***********************************************
| MASSBUS DEVICE ERROR
| LOGGED ON Thu 23 Feb 84 03:12:43 MONITOR UPTIME WAS 3:41:34
| DETECTED ON SYSTEM # 2871.
| RECORD SEQUENCE NUMBER: 1249.
| ***********************************************
| UNIT NAME: DP100
| UNIT TYPE: RP07
| UNIT SERIAL #: 2861.
| VOLUME ID: WORK
| LBN AT START OF XFER: 1074014 =
| CYL: 212. SURF: 27. SECT: 3.
| OPERATION AT ERROR: DEV.AVAIL., GO + READ DATA(70)
| FINAL ERROR STATUS: 100000,1
| RETRIES PERFORMED: 2.
| ERROR: RECOVERABLE
| DRIVE EXCEPTION,CHN ERROR, IN CONTROLLER CONI
| DCK, IN DEVICE ERROR REGISTER
|
| CONTROLLER INFORMATION:
| CONTROLLER: RH20 # 1
| CONI AT ERROR: 0,222715 =
| DRIVE EXCEPTION,CHN ERROR,
| CONI AT END: 0,2415 =
| NO ERROR BITS DETECTED
| DATAI PTCR AT ERROR: 732200,177471
| DATAI PTCR AT END: 732200,177471
| DATAI PBAR AT ERROR: 720000,15403
| DATAI PBAR AT END: 720000,15403
|
| CHANNEL INFORMATION:
| CHAN STATUS WD 0: 200000,174567
| CW1: 0,0 CW2: 0,0
| CHN STATUS WD 1: 540100,174632 =
| NOT SBUS ERR,NOT WC = 0,LONG WC ERR,
| CHN STATUS WD 2: 614005,377200
|
| DEVICE REGISTER INFORMATION:
| AT ERROR AT END DIFF.
| CR(00): 4070 4070 0
| DEV.AVAIL., READ DATA(70)
| SR(01): 51700 11700 40000
| ERR,MOL,PGM,DPR,DRY,VV,
| ER(02): 100000 0 100000
| DCK,
| MR(03): 0 0 0
| AS(04): 0 0 0
4-22
SPEAR FUNCTIONS
| DA(05): 15404 15407 3
| D. TRK = 33, D.SECT. = 4
| DT(06): 24042 24042 0
| LA(07): 1700 700 1000
| SN(10): 24141 24141 0
| OF(11): 0 0 0
| DC(12): 324 324 0
| 212.
| CC(13): 324 324 0
| 212.
| E2(14): 0 0 0
| NO ERROR BITS DETECTED
| E3(15): 0 0 0
| NO ERROR BITS DETECTED
| EP(16): 1454 0 1454
| PL(17): 2400 0 2400
|
| DEVICE STATISTICS AT TIME OF ERROR:
| # OF READS: 342126. # OF WRITES: 62772. # OF SEEKS: 15252.
| # SOFT READ ERRORS: 1. # SOFT WRITE ERRORS: 0.
| # HARD READ ERRORS: 0. # HARD WRITE ERRORS: 0.
| # SOFT POSITIONING ERRORS: 0.
| # HARD POSITIONING ERRORS: 0.
| # OF MPE: 0. # OF NXM: 0. # OF OVERRUNS: 0.
4-23
SPEAR FUNCTIONS
4.4 SUMMARIZE
SUMMARIZE reads the system event file and summarizes its contents
according to the following categories:
1. Event code
2. STOPCODE (TOPS-10)
3. BUGCHK, BUGHLT, BUGINF (TOPS-20)
4. Front-end reloads
4-24
SPEAR FUNCTIONS
5. Channel errors
6. Disk errors
7. Magnetic tape errors
The SUMMARIZE report also contains Error Distribution tables. These
tables show a 24 hour distribution of events listed according to
subsystem. With these tables, you can determine when the large number
of events is occurring. Once you know the subsystem (Mainframe, Disk,
Tape, and so forth) and the timeframe, you can use RETRIEVE or ANALYZE
to pinpoint the specific device that is causing the problem.
After reading the file, SUMMARIZE produces an ASCII report file
containing the summaries and Error Distribution tables and stores it
in your disk area (or wherever you specify). You can then print the
report on the lineprinter for inspection. You can also print the
report on the terminal by specifying TTY: to SPEAR's request for the
output destination.
SUMMARIZE allows you to pinpoint the timeframe of the summaries by
requesting a beginning date and an ending date to search for in the
system event file. In addition, you can also specify a binary file
created with the RETRIEVE process (RETRIE.SYS) for input. See Section
4.3 for information on RETRIEVE.
4.4.1 The SUMMARIZE Report
| The following example is representative of a SUMMARIZE report in that
| it contains:
|
| o File environment information
|
| o Entry occurrence counts
|
| o System event codes, shown in parentheses under entry
| occurrence counts
|
| o Summaries of bugchecks and subsystems
|
| o Error distribution tables
|
| Note that if the media name cannot be identified in reports that
| include media identification, SUMMARIZE uses three specific formats:
|
| 1. <unknown> - if SUMMARIZE does not find a mount record in the
| error file prior to the time of the error.
|
| 2. <none> - if a series of mount and dismount records indicate
| no medium was mounted at the time of the error, such as an
| error occurring during the mount process.
|
| 3. <blank> - if SUMMARIZE finds a mount record but the
| medium-name field of the mount record is empty.
|
| Note the error register codes listed in the report are described in
| Section 4.3.3.
|
| File Environment
|
4-25
SPEAR FUNCTIONS
| SPEAR Version 2(613)
| Input file: SERR:ERROR.SYS Created: 12-Mar-84 08:49:00-EST
| Output file: DSK:SUMMAR.RPT
|
| Selection Criteria: ALL
|
| Date of first entry processed: 14-Mar 01:22:13
| Date of last entry processed: 14-Mar 23:53:38
|
| Number of entries processed: 1128.
| Number of inconsistencies detected in error file: 0.
|
| Entry Occurrence Counts:
|
| 9. SYSTEM RELOAD ...(101)
| 496. MONITOR BUG ...(102)
| 36. MASSBUS ERROR ...(111)
| 120. STATISTICS ...(114)
| 8. CONFIGURATION CHANGE ...(115)
| 102. FRONT END DEVICE ERROR ...(130)
| 1. CPU PARITY INTERRUPT ...(162)
| 294. PHASE III DECNET ENTRY ...(240)
| 62. HSC50 ERROR LOG ...(243)
|
|
| Monitor Detected Errors and Reloads:
|
| 43. BUGCHK
| 4. BUGHLT
| 449. BUGINF
| Monitor Error and Reload Breakdown:
|
| BUGCHK Breakdown
| 8. FLKTIM
| 2. KLPERR
| 17. MSCORO
| 3. NODDMP
| 5. PI2ERR
| 4. SCACVC
| 4. SCATMO
|
| BUGHLT Breakdown
| 1. ILPSEC
| 1. NOTOFN
| 1. SKDPF1
| 1. UNPGF2
|
| BUGINF Breakdown
| 8. CFCONN
| 4. KLPCVC
| 29. KLPNUP
| 1. KLPRRQ
| 1. KLPSTR
| 28. MSCAVA
| 2. MSCDSR
| 7. MSCPTG
| 324. NSPBAD
| 29. NSPLAT
| 2. NTOHNG
| 1. SPRZRO
| 1. TM8AEI
4-26
SPEAR FUNCTIONS
| 12. TTYSTP
|
| Front-end Summary:
|
| 10. CD20
| 10. DH11
| 10. DL11C
| 10. DM11
| 1. DM11-3
| 6. KLCPU
| 45. KLERR records forming 5. full entries
| 10. LP20
|
| DECnet Phase III Summary:
|
| Class.Type Count Description
|
| 0.0 10. Event records lost
| 0.3 8. Automatic line service
| 2.0 2. Local node state change
| 4.0 29. Aged packet loss
| 4.1 233. Node unreachable packet loss
| 4.4 1. Packet format error
| 4.7 6. Circuit down, circuit fault
| 4.10 5. Circuit up
|
|
| RH20 Channel/Controller Summary:
|
| Hard Soft
| # 1 0. 1.
| # 2 5. 30.
|
|
| RP07 Summary:
|
| Hard Soft
| S/N 2861
| DP100 0. 1.
|
|
| TM78 Summary:
|
| Hard Soft
| S/N 4404
| MT200 2. 4.
| S/N 5242
| MT210 3. 26.
|
|
| RH20 Breakdown (CONI)
|
| PAR LWC SWC CHN RES OVR
| ERR EXC ERR ERR ERR ERR RAE RUN
|
|
| DP100
| SOFT 1. 1.
|
| MT200
| HARD 2.
4-27
SPEAR FUNCTIONS
| MT200
| SOFT 4.
|
| MT210
| HARD 3.
| MT210
| SOFT 26.
|
| *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
| * *
| * Disk Subsystem Error Summary *
| * *
| *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
|
|
| Disk Subsystem Error Entries Summarized by Device, then Error Type.
| Where the Error Types are the following:
|
| OTHER = OTHER
| TIMIN = TIMING
| SK-SR = SEEK-SEARCH
| READ = READ-WRITE
| CH-CO = CHANNEL-CONTROLLER
| BUS = BUS
| SOFT = HARDWARE DETECTED SOFTWARE ERROR
| MICRO = MICROPROCESSOR DETECTED ERROR
| UNSAF = UNSAFE
| WRTLK = WRITE LOCK
| OFFLI = OFFLINE
|
|
|
| OTHER TIMIN SK-SR READ CH-CO BUS SOFT MICRO UNSAF WRTLK OFFLI
| ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
|
| DP100
| 1.
| DU-7-14-17
| 36. 3.
| DU-7-3-17
| 19. 3. 1.
|
| Read Data Errors further summarized by Drive and Media ID.
|
| Drive Media Error Totals
| ------- ------ --------
|
| DP100 WORK 1.
|
|
| *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
| * *
| * This report summarizes all Read Data Errors by Drive and Media ID *
| * *
| *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
|
|
|
|
| DRIVE MEDIA CYL TRK SECT HARD SOFT RETRIES LBN
| ----- ----- --- --- ---- ---- ---- ------- ---
|
4-28
SPEAR FUNCTIONS
| DP100 WORK 565. 5. 15. 0. 1. 2. 2,,756704
|
|
| RP07 BREAKDOWN:
|
| Error Register 1
|
| D U O D W I A H H E W F P R I I
| C N P T L A O C C C C E A M L L
| K S I E E E E R E H F R R R R F
| C
|
| S/N 2861
| DP100 S 1.
|
| *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
| * *
| * Tape Subsystem Error Summary *
| * *
| *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
|
|
| Tape Subsystem Error Entries Summarized by Device, then Error Type.
| Where the Error Types are the following:
|
| OTHER = OTHER
| READ = READ
| WRITE = WRITE
| FORMT = DEVICE FORMAT
| CH-CO = CHANNEL-CONTROLLER
| BUS = BUS
| SOFT = HARDWARE DETECTED SOFTWARE ERROR
| OPER = OPERATOR
| OFFLI = OFFLINE
|
|
|
| OTHER READ WRITE FORMT CH-CO BUS SOFT OPER OFFLI
| ----- ----- ----- ----- ----- ----- ----- ----- -----
|
| MT200 6.
|
| MT210 29.
|
| *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
| * *
| * SUMMARY of all Errors sorted by Media and Drive by *
| * Operation. *
| * *
| *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
|
|
|
|
| Operation : WRITE Related
|
| MEDIA
| ID UNIT ID
|
| MT200 MT210 TOTAL
| ------ ------ ------
4-29
SPEAR FUNCTIONS
| unknown ! 6. ! 29. ! 35.
|
| TOTAL ! 6. ! 29. ! 35.
|
|
|
| TM78 Breakdown:
| (Interrupt and Failure Codes are OCTAL)
| Interrupt Failure Hard Soft
| Code Code
|
| S/N 4404
| MT200 22 (WRITE) 7 0. 3.
| MT200 22 (WRITE) 10 0. 1.
| MT200 22 (WRITE) 14 2. 0.
| S/N 5242
| MT210 22 (WRITE) 1 0. 7.
| MT210 22 (WRITE) 4 0. 10.
| MT210 22 (WRITE) 7 0. 1.
| MT210 22 (WRITE) 10 0. 8.
| MT210 22 (WRITE) 14 3. 0.
|
|
| Error distribution
|
| Main-|Disk |Tape |Unit |Comm |Net- |Soft-|Crash|Totals
| 14-Mar-84 frame| | |rec | |work |ware | |
| -----+-----+-----+-----+-----+-----+-----+-----+-----
| 1:00 - 2:00 | | | | | 6.| | 5.| 11.
| 6:00 - 7:00 | 7.| | | | 6.| | 12.| 25.
| 8:00 - 9:00 19.| 35.| | | | 13.| | 64.| 133.
| 9:00 - 10:00 | 20.| | | | 5.| | 31.| 56.
| 10:00 - 11:00 9.| | | | | 10.| | 7.| 28.
| 11:00 - 12:00 9.| | | | | 6.| | 6.| 22.
| 12:00 - 13:00 | | | | | | | 3.| 3.
| 13:00 - 14:00 | | | | | 1.| | 9.| 10.
| 14:00 - 15:00 | | | | | 3.| | 7.| 10.
| 15:00 - 16:00 9.| | 4.| | | 27.| | 45.| 86.
| 16:00 - 17:00 | | | | | 91.| | 76.| 167.
| 17:00 - 18:00 | | | | | 19.| | 6.| 25.
| 18:00 - 19:00 | | 2.| | | 22.| | 38.| 62.
| 19:00 - 20:00 | | 11.| | 1.| 17.| | 43.| 72.
| 20:00 - 21:00 | 1.| 8.| | | 21.| | 39.| 69.
| 21:00 - 22:00 | | 4.| | | 19.| | 38.| 61.
| 22:00 - 23:00 | | 2.| | | 12.| | 38.| 52.
| 23:00 - 0:00 | | 4.| | | 16.| | 38.| 58.
| -----+-----+-----+-----+-----+-----+-----+-----+-----
| Totals 46.| 63.| 35.| | 1.| 294.| | 505.| 950.
|
| Due to the addition of the CI and HSC50, you will find another format
| for listing the names of disks in the SUMMARIZE report. In the
| previous report, you will find the following:
|
| DU-7-14-17
| DU-7-3-17
|
| Starting from left to right, these four fields represent the
| following:
|
| Field one Device type DU = RA80, RA81
| DJ = RA60
4-30
SPEAR FUNCTIONS
| ?? = unknown
|
| Field two RH slot number for the CI20. This is always
| number 7.
|
| Field three HSC50 node number on the CI.
|
| Field four Drive number on the push button. If the
| HSC50 cannot get this number, the number 4095
| appears in this field.
|
| Note you will find a description of the Disk Subsystem Error Bits in
| Appendix E.
4-31
SPEAR FUNCTIONS
4.4.2 Error Register Codes
The following tables contain brief explanations of the abbreviations
of the error register codes (MASSBUS disk registers for RP04s and
RP06s and tape registers for TU45s, TU77s, and TE16s).
|
|
| Table 4-5: MASSBUS Disk Registers
|
|
Error Register 1
| Code Meaning
DCK Data Check
UNS Unsafe
OPI Operation Incomplete
DTE Drive Timing Error
WLE Write Lock Error
IAE Invalid Address Error
AOE Address Overflow Error
HCRC Header CRC Error
HCE Header Compare Error
ECH ECC Hard Error
WCF Write Clock Fail
FER Format Error
PAR Parity Error
RMR Register Modification Refused
ILR Illegal Register
ILF Illegal Function
Error Register 2
| Code Meaning
ACU RP04 - AC Unsafe
RP06 - Unused
PLU Phase Locked Oscillator Unsafe
30VU RP04 - 30 Volts Unsafe
RP06 - Unused
IXE Index Error
NHS No Head Select
MHS Multiple Head Select
WRU Write Ready Unsafe
FEN RP04 - Failsafe Enabled
ABS RP06 - Abnormal Stop
TUF Transition Unsafe
TDF Transition Detector Failure
MSE RP04 - Motor Sequence Error
R&W RP06 - Read and Write
CSU Current Switch Unsafe
WSU Write Select Unsafe
CSF Current Sink Failure
WCU Write Current Unsafe
4-32
SPEAR FUNCTIONS
Error Register 3
| Code Meaning
OCYL Off Cylinder
SKI Seek Incomplete
OPE RP04 - Unused
RP06 - Operator Plug Error
ACL AC Voltage Unsafe
DCL DC Voltage Unsafe
DIS RP04 - Unused
35V 35 Volts Unsafe
UWR RP04 - Any Unsafe Except Read/Write
RP06 - Unused
VUF RP04 - Velocity Unsafe
WOF RP06 - Write and Unsafe
PSU RP04 - Pack Speed Unsafe
DCU RP06 - DC Voltage Unsafe
|
|
| Table 4-6: Tape Registers
|
|
| Code Meaning
COR/CRC PE - Correctable Data Error
NRZI - CRC Does Not Match Computed CRCC
UNS Unsafe
OPI Operation Incomplete
DTE Drive Timing Error
NEF Nonexecutable Function
CS/ITM PE - Correctable Skew
NRZI - Illegal Tape Mark
FCE Frame Count Error
NSG Nonstandard Gap Tape Character
PEF/LRC PE - Format Error
NRZI - Longitudinal Redundancy Check
INC/VPE PE - Noncorrectable Data Error
NRZI - Vertical Parity Error
DPA Data Bus Parity Error
FMT Format Error
PAR Control Bus Parity
RMR Register Modification Refused
ILR Illegal Register
ILF Illegal Function
4-33
SPEAR FUNCTIONS
4.4.3 SUMMARIZE Procedure
SUMMARIZE prompts with one or more of the following guidewords:
SUMMARIZE Mode
______________
Event file (SERR:ERROR.SYS):
| Category (ALL):
Time from (EARLIEST):
Time to (LATEST):
| Show Error Distribution (YES):
Report to (DSK:SUMMAR.RPT):
| If you want to take all the defaults, type S/G to the SPEAR> prompt;
otherwise, read the following procedure:
STEP 1
| After you type SUMMARIZE to the SPEAR> prompt, SUMMARIZE requests the
name of the input file:
Event file (SERR:ERROR.SYS): TOPS-20
or
Event file (SYS:ERROR.SYS): TOPS-10
Type one of the following:
1. The RETURN key - to take the default, the system event file.
2. The name of a file you have previously RETRIEVEd, in binary
format, for example RETRIE.SYS.
3. Any file in binary format containing events from the system
event file.
| STEP 2
|
| SUMMARIZE asks for the category of the summary in which you are
| interested:
|
| Category (ALL):
|
| Type one of the following:
|
| 1. The RETURN key or A[LL] - to take the default of all
| categories.
|
| 2. M[AINFRAME] - to select a summary for mainframe events.
|
| 3. D[ISK] - to select a summary for disk devices.
4-34
SPEAR FUNCTIONS
| 4. T[APE] - to select a summary of tape devices.
|
| 5. CI - to select a summary of CI-related events.
|
| 6. NI - to select a summary of NI-related events.
|
| 7. U[NITRECORD] - to select a summary of hard-copy devices.
|
| 8. NE[TWORK] - to select a summary of network-related events.
|
| 9. O[PERATING-SYSTEM] - to select a summary of software-related
| events.
|
| 10. CO[MM] - to select a summary of communication devices.
|
| 11. P[ACKID] - to select a summary of specific disk packs.
|
| 12. R[EELID] - to select a summary of specific tape reels.
|
| All categories except for COMM and NI prompt for specific device
| types. Table 4-7 lists the subprompts you can expect.
|
|
| Table 4-7: Subprompts for Device Types
|
|
| Device Type Subprompt
|
| MAINFRAME Mainframe devices (ALL):
| DISK Disk drives (ALL):
| TAPE Tape drives (ALL):
| CI CI controllers (ALL):
| UNITRECORD Unit record devices (ALL):
| NETWORK Event class and type (ALL):
| OPERATING-SYSTEM Operating System codes (ALL):
| PACKID Disk (structure IDs):
| REELID Tape (reel IDs):
|
| STEP 3
|
| SUMMARIZE keeps prompting you for categories until you either type
| FINISHED or press the RETURN key:
|
| Next Category (FINISHED):
|
| Type one of the following:
|
| 1. The RETURN key or F[INISHED] - to take the default.
|
| 2. Another category.
|
| STEP 4
After you have specified the source of input, SUMMARIZE prompts you
for the date and time at which you want the summary to begin:
Time from (EARLIEST):
Type one of the following:
4-35
SPEAR FUNCTIONS
1. The RETURN key - to take the default EARLIEST, the first
event in the file.
2. A date and time in the format dd-mmm-yy hh:mm:ss - to signify
where to begin extracting entries. A date by itself defaults
to one second after midnight.
3. A date and time in the format -nn to indicate a reference
point prior to the current date. For example, -7 causes
SUMMARIZE to begin extracting entries seven days prior to the
current day.
4-36
SPEAR FUNCTIONS
| STEP 5
SUMMARIZE then prompts for the end of the time period:
Time to (LATEST):
Type one of the following:
1. The RETURN key - to take the default LATEST, the last entry
in the system event file.
2. A date and time in the format dd-mmm-yy hh:mm:ss - to
indicate the last date for extracted entries. A date by
itself defaults to one second after midnight.
3. A date and time in the format -nn to indicate a reference
point prior to the current date. For example, -13 causes
SUMMARIZE to stop extracting entries recorded thirteen days
before the current date.
| STEP 6
|
| After specifying a timeframe, you can choose whether or not to receive
| the error distribution tables:
|
| Show Error Distribution (YES):
|
| Type one of the following:
|
| 1. The RETURN key or Y[ES] - to take the default. This will
| give you all the error distribution charts relevant to the
| time constraints you specify.
|
| 2. N[O] - to suppress the error distribution charts from the
| report.
|
| STEP 7
The last thing SUMMARIZE asks for is the destination of the output:
Report to (DSK:SUMMAR.RPT):
Type one of the following:
1. The RETURN key - to take the default DSK:SUMMAR.RPT.
2. Any file name in the proper format.
3. TTY: - to have the report printed on your terminal. Note
that if you specify TTY:, SUMMARIZE does not save the file in
your disk area.
After you select the output destination and press RETURN, SPEAR asks
you to confirm your decision.
| Type <cr> to confirm (/GO):
At this point you can:
4-37
SPEAR FUNCTIONS
1. Press RETURN or type /GO to execute the SUMMARIZE process.
2. Type /SHOW to list the parameters you have chosen.
3. Type /REVERSE to return to the previous prompt.
4. Type /BREAK to return to SPEAR level.
5. Type question mark (?), HELP, the question mark switch (/?),
or /HELP to find out what your options are.
To read the SUMMARIZE report, you can list the file on the lineprinter
by doing the following:
Return to operating system command level by typing EXIT to the
SPEAR> prompt.
4-38
SPEAR FUNCTIONS
Use the PRINT command with any options available on your
operating system.
Note that if you specified TTY: to the Report to: prompt, you will
not have a file saved in your area to print.
4.4.4 Sample SUMMARIZE Session
The following is a sample of a SUMMARIZE session using the system
event file for input:
| @spear
|
| Welcome to SPEAR for TOPS-20. Version 2(605)
| Type "?" for help.
|
| SPEAR> summarize
|
| SUMMARIZE mode
| --------------
| Event file (SERR:ERROR.SYS):
|
| Category (ALL): main
|
| Mainframe devices (ALL): cpu
|
| Next Category (FINISHED): disk
|
| Disk drives (ALL): rpo7
|
| Next Category (FINISHED):
|
| Time from (EARLIEST):
|
| Time to (LATEST):
|
| Show Error Distribution (YES): no
|
| Report to (DSK:SUMMAR.RPT):
|
| Type <cr> to confirm (/GO):
| INFO - Summarizing ST:GIDNEY.02-27
| INFO - Now sending summary to DSK:SUMMAR.RPT
| INFO - Summary output finished
|
|
| SPEAR> ex
4.5 TOPS-20 KLSTAT MODE
On TOPS-20, there is an additional troubleshouting aid that can be
helpful if severe intermittent faults do not leave enough information
in the system event file. This feature is the KLSTAT mode. When you
turn KLSTAT on, you are actually turning on a monitor flag that tells
the monitor to record additional information into the system event
file when any CPU, memory, or MASSBUS errors occur.
4-39
SPEAR FUNCTIONS
Note that turning on this flag causes severe system degradation (the
system goes down while KLSTAT is collecting data) you should turn it
on only when absolutely necessary. In fact, you must have special
privileges to turn it on or off.
When the KLSTAT mode is in operation, the system event file will
contain KL CPU STATUS BLOCK entries. For a sample of such an entry,
turn to Section 5.3.12. For the KLSTAT procedure, read the following
section, Section 4.5.1.
4-40
SPEAR FUNCTIONS
4.5.1 KLSTAT Procedure
The KLSTAT mode has three functions: ON, OFF, and CHECK. The
following procedure describes their use:
STEP 1
First, enable your special privileges at monitor level, either
OPERATOR or WHEEL privileges. Then access SPEAR. (Note, you do not
need privileges to CHECK the status of KLSTAT.)
STEP 2
Once at the SPEAR prompt, type K[LSTAT]:
SPEAR>KLSTAT
SPEAR responds with:
SPEAR>KLSTAT
KLSTAT mode
___________
Extra reporting (CHECK):
STEP 3
At this point, type one of the three options. Pressing the Escape key
gets you the default, CHECK. If you type ON, you will get this
message:
The following should be noted before proceeding!
This function can cause SEVERE system degradation!
If you decide not to risk it, type /R to return to the SPEAR prompt.
STEP 4
If you respond with one of the three choices, SPEAR prompts with:
| Type <cr> to confirm (/GO):
If you chose ON or OFF, SPEAR returns you to the SPEAR prompt. If you
chose CHECK, the default, SPEAR prints one of the following:
(KLSTAT) Extra error reporting is currently enabled.
or
(KLSTAT) Extra error reporting is currently disabled.
You can check the information gathered by turning on the KLSTAT mode
by looking for the KL CPU STATUS BLOCK entry in the system event file.
See Section 5.3.12.
4-41
5-1
CHAPTER 5
ENTRY DESCRIPTIONS
5.1 INTRODUCTION
This chapter provides a sample of most of the events that can be
recorded in the system event file. These samples appear just as you
see them when you use RETRIEVE to translate entries from binary to
ASCII. Although the entries may differ in format, they each have
sections in common, some more than others depending on the operating
system involved. Each entry may contain from one to six sections of
information:
Section 1 Entry Description
Section 2 Unit Identification
Section 3 Software Status
Section 4 Controller Status
Section 5 Device or Unit Status
Section 6 Statistical Information
Every entry has at least a Section 1, Entry Description. This section
contains:
1. Type of entry and/or type of error
2. Error-entry date and time that it was logged
3. Monitor uptime
4. System serial number
Entries may contain Sections 2 through Section 6. Section 2 contains
the following information:
1. Unit logical name
2. Unit physical name
3. Unit type
4. Media identification
5-1
ENTRY DESCRIPTIONS
Section 3 contains the following:
1. Highest process requesting service (user)
2. Lowest process requesting service (author)
3. User/process identification (user identification, program
name, file name, program location in memory, and so forth)
4. Pertinent system registers (processor flags, program counter,
and so forth) before and/or after error as applicable
5. Disposition of event (retry count, recovered or not, the
point in the retry algorithm where recovery was affected, and
so forth)
6. Other I/O activity at error time
Section 4 contains the following:
1. Controller name and/or address
2. Controller type
3. Name and value of all information available from the
controller
Section 5 contains the following:
1. Name and value of all status information available from the
unit
2. Function that was active at error time
3. Logical and physical address of the unit before error
4. Logical and physical address of the unit at error
5. Transfer size and starting memory location of I/O if
applicable
Section 6 contains unit activity since start-up.
The default radix in these entries is decimal; however, some entries
may have numbers displayed in octal or binary.
5.2 TOPS-10 ENTRIES
The following sections list both the FULL and SHORT versions of the
entries that TOPS-10 can record in its system event file.
5.2.1 System Reload
The monitor generates a System Reload entry into the system event file
whenever it is loaded. Note that HALT, STOP, and CPU stopcode
information is also recorded in this entry, if applicable.
5-2
ENTRY DESCRIPTIONS
FULL
***********************************************
SYSTEM RELOAD
LOGGED ON 5-Aug-80 AT 0:16:39 MONITOR UPTIME WAS 0:00:38
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 190.
***********************************************
CONFIGURATION INFORMATION
SYSTEM NAME: RZ064A KL #1026/1042
MONITOR BUILT ON: 07-23-80
CPU SERIAL #: 1026.
STATES WORD: 771165,0
MONITOR VERSION %701(0)
RELOAD BREAKDOWN
CAUSE: SCHED
COMMENTS ;PUT 1
MEMORY ON-LINE AT RELOAD:
FROM: 0 P TO: 2048 P
SHORT
SEQ TIME 5-Aug-80
190. 0:16:39 RELOAD OF RZ064A KL #1026/1042 VERSION (70100)
BUILT ON 07-23-80 REASON SCHED
5.2.2 Non-Reload Monitor Error
Each time a JOB or DEBUG stopcode occurs, the monitor records the
information as a Non-Reload Monitor Error in the system event file.
The JOB stopcode endangers the integrity of the job currently running;
therefore, the monitor aborts the current job, then continues. A
DEBUG stopcode is not immediately harmful to any job or the system;
therefore, the monitor prints the stopcode message on the operator's
terminal (CTY) and then continues processing.
FULL
***********************************************
NON-RELOAD MONITOR ERROR
LOGGED ON 5-Aug-80 AT 10:51:49 MONITOR UPTIME WAS 2:26:26
DETECTED ON SYSTEM # 1042.
RECORD SEQUENCE NUMBER: 863.
***********************************************
SYSTEM NAME: RZ64C KL #1026/1042
SYSTEM SERIAL #: 1026.
MONITOR DATE: 07-23-80
MONITOR VERSION %701(0)
STOPCD NAME: BAZ
RESULT:
JOB #: 6.
USER'S ID: [1,2]
TTY NAME: 470
PROGRAM NAME: ACTDAE
5-3
ENTRY DESCRIPTIONS
CONTENTS OF AC'S AT STOPCD:
0: 20,0
1: 777642,377507
2: 0,100
3: 5777,371000
4: 526200,340000
5: 664145,663167
6: 440004,0
7: 0,50
10: 0,0
11: 0,505273
12: 0,250255
13: 47040,1
14: 0,1
15: 0,1
16: 0,4
17: 0,146
PI STATUS: 440004,0
SHORT
SEQ TIME 5-Aug-80
863. 10:51:49 STOPCD BAZ ON CPU SERIAL # 1026 FOR JOB # 6 ON 470
USER WAS [1,2] RUNNING ACTDAE
5.2.3 Crash Extract
A Crash Extract becomes a part of the system event file whenever the
program DAEMON starts. When DAEMON starts, it checks the system
search list for a CRASH.EXE file. If it finds one, it extracts the
information and appends it to the system event file.
NOTE
It is strongly recommended that, each time the monitor
is started, you save a dump as a CRASH.EXE file so
that DAEMON/SPEAR can provide a complete picture of
system activity. You can do this by saving each
monitor core image (dumping the crash) after each run;
that is, before PM or CM periods, before scheduled
reloads, after stand-alone periods, and so forth. To
save core-image, use the /D command to MONBTS.
Because DAEMON extracted the information from a saved crash, the date
and time and the monitor uptime in the header are the last values
recorded by the monitor before the crash.
5-4
ENTRY DESCRIPTIONS
5-5
ENTRY DESCRIPTIONS
5-6
ENTRY DESCRIPTIONS
5.2.4 Data Channel Error
When a channel detects an error or a device connected to a channel
detects an error during a data transfer, the monitor logs a Data
Channel Error into the system event file. The entry is made at the
time of first error; thus, the entry can be a soft or a hard error.
Because the monitor programs the channel to stop when it encounters an
error (except on the last retry), this entry gives valuable
information about the word in error and its address, whether or not
the error was detected by the channel.
The Data Channel Error is generated only for DF10 data channels and is
not generated for devices using the KL10 internal channels (RH20).
FULL
***********************************************
DATA CHANNEL ERROR
LOGGED ON 1-Oct-80 AT 9:03:12 MONITOR UPTIME WAS 1:02:10
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 3122.
***********************************************
DATA CHANNEL ERROR TOTALS
NXM'S AND OVERRUNS: 1.
MEM PE SEEN BY CHANNEL: 0.
CONTROLLER DATA PE
OR CCW TERM CHK FAILS: 0.
CHANNEL COMMAND LIST BREAKDOWN
DEVICE USING CHANNEL: RPA5
INITIAL CONTROL WORD: 0,454
TERMINATION WD WRITTEN: 11323,313216
EXPECTED TERM. WORD: 11323,313413
CHANNEL COMMAND LIST: 0,454
774003,313213
0,0
3RD FROM LAST DATA WORD:0,0
2ND FROM LAST DATA WORD:0,0
LAST DATA WORD XFERRED: 0,0
SHORT
SEQ TIME 1-Oct-80
3122. 9:03:12 RPA5 CHANNEL ERROR COUNTS: NXM/MPE/DPE 1/0/0
WRITTEN TERM WD = 11323,313216
EXPECTED TERM WD = 11323,313413
5.2.5 DAEMON Started
The monitor logs this entry into the system event file each time
DAEMON is started, either after a system reload or a restart of
DAEMON. If DAEMON is modified at the site, the customer version
number should be edited to track the modifications.
5-7
ENTRY DESCRIPTIONS
FULL
***********************************************
DAEMON STARTED
LOGGED ON 5-Aug-80 AT 0:16:30 MONITOR UPTIME WAS 0:00:28
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 184.
***********************************************
DAEMON VERSION 20(757)
SHORT
SEQ TIME 5-Aug-80
184. 0:16:30 DAEMON STARTED--VERSION 20(757)
5.2.6 MASSBUS Disk Error
Any time the monitor detects an error in any portion of the MASSBUS
system (either hardware or software), DAEMON is called to collect and
record all pertinent hardware and software information in the error
file.
In this entry, the MEDIA ID is the value given to the disk when
structured with ONCE or TWICE. The STR ID is the logical name of the
media such as DSKB0. Both are recorded in the HOME block. The LBN
(logical block number) is the location of the first block in the
transfer. If LBN n, n+1, n+2, and n+3 were transferred, it is
possible that LBN n, n+1, and n+2 are alright, but LBN n+3 is bad.
This value is broken into either the cylinder #, surface #, and sector
# (for disks) or the track # and sector # (for RS04s) to determine the
physical location of the failure.
The OPERATION AT ERROR is the text translation of the last command
issued to the device before the error was detected (presumably the
command that caused the error). The text translation should match the
translation of the bits in DATAI RHCR AT ERROR for the RH10 and DATAI
PTCR AT ERROR for an RH20. If the information does not match, look
for an error in the control bus.
NOTE
Because of dual-port capabilities for disk drives, the
physical device number can change according to the
port assignment. For example, on dual-ported drives,
one drive may be RPA3 on PORT A and RPC3 on PORT B.
MASSBUS devices store and make available significant amounts of
device-dependent information. The contents of all registers are
listed in the entry both at error time and after the last retry, along
with the difference between the two values. Text translations are
always from the AT ERROR value with the exception of the OFFSET
Register; offsets are not normally used.
5-8
ENTRY DESCRIPTIONS
Note that software errors are checked only after the hardware has
completed the transfer without a detected error.
5.2.7 DX20 Device Error
The monitor records a DX20 Device Error in the system event file when
it detects an error in any portion of the MASSBUS system connected to
the DX20 channel interface.
In this entry, the MASSBUS REGISTER INFORMATION contains the nonzero
contents of all registers both at error time and after the last retry.
Also the SB (sense bytes) describe the device type and status of the
5-9
ENTRY DESCRIPTIONS
device (in octal) attached to the DX20.
5-10
ENTRY DESCRIPTIONS
5-11
ENTRY DESCRIPTIONS
5-12
ENTRY DESCRIPTIONS
5.2.8 Software Event
This entry is logged into the system event file when a user with
special privileges, for example the system operator, issues one of the
following monitor calls: POKE, RTTRP, SNOOP, or TRPSET. These
monitor calls have the following effect:
1. POKE changes the value of a word in monitor core.
2. RTTRP connects a device to or releases it from the realtime
interrupt facility.
3. SNOOP allows privileged programs to insert breakpoints in the
monitor that trap to a user program. The user program must
be locked in core when the trap occurs. This feature is used
for fault insertion, performance analysis, and trace
functions.
4. TRPSET prevents jobs other than the calling job from running.
You can use this call to guarantee fast response to realtime
interrupts.
For more information on monitor calls, refer to the TOPS-10 Monitor
Calls Manual.
FULL
***********************************************
SOFTWARE EVENT
LOGGED ON 14-Jul-80 AT 8:56:45 MONITOR UPTIME WAS 0:42:42
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 1.
***********************************************
EVENT TYPE: POKE
JOB #: 46.
USER PPN: [10,5324]
LOCATION OF USER:
NODE:26
LINE:154
TTY154
PROGRAM: SPICE
STORED DATA VALUES:
0,34030
SHORT
SEQ TIME 14-Jul-80
1. 8:56:45 SOFTWARE EVENT TYPE: POKE BY JOB 46 USER WAS [10,5324]
RUNNING SPICE AT NODE: 26 LINE: 154 TTY154
5-13
ENTRY DESCRIPTIONS
5.2.9 Configuration Status Change
The monitor records a Configuration Status Change whenever the system
operator marks disk units and sections of core memory on-line or
off-line. The system operator uses the either the CONFIG program or
the SET command to change the system configuration. These tools are
useful because they can prevent further errors to users until a unit
can be repaired, or they can be used to split and later join dual CPU
systems. For more information on the CONFIG program, refer to the
file CONFIG.DOC.
With the SET command, the system operator can also give a 2-character
reason for the change in configuration. Any two characters can be
used, but the following codes are suggested:
PM - preventive maintenance
CM - corrective maintenance
DN - unit is down
OT - other
CAUTION
When the system operator adds memory to the system,
the monitor checks to verify the availability of the
specified addresses. Mistakes are reported at the
operator's terminal (CTY), but the error logging
system treats these as valid NXMs and generates the
appropriate NXM reports. You can identify a NXM
report of this type because no physical memory is
placed off-line and the user's directory is [1,2].
FULL
***********************************************
CONFIGURATION STATUS CHANGE
LOGGED ON 4-Aug-80 AT 14:06:05 MONITOR UPTIME WAS 1:44:50
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 15.
***********************************************
COMMAND:DETACH
DEVICE:RNA0
SHORT
SEQ TIME 4-Aug-80
15. 14:06:05 CONFIGURATION CHANGE DETACHED RNA0
5-14
ENTRY DESCRIPTIONS
5.2.10 System Log Entry
The monitor records a System Log Entry when the system operator enters
a log entry into the system event file with the OPR program.
A system operator, or anyone with operator privileges, can make an
entry into the system event file by doing the following:
1. Run the OPR program
.OPR<RET>
OPR>
2. When you see the prompt, specify the REPORT command:
OPR>REPORT
3. Use the following syntax:
OPR>REPORT user text <RET>
where user can be directory name and/or device name and text
can be a single-line or multiple-line response.
For more information on OPR, refer to the TOPS-10 Operator's Command
Language Reference Manual.
FULL
***********************************************
SYSTEM LOG ENTRY
LOGGED ON 15-Sep-80 AT 10:40:12 MONITOR UPTIME WAS 5:30:10
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 37.
***********************************************
ENTRY CREATED BY:
JOB #, TTY #: 77,502
P,PN: [27,2617]
WHO: MASELL
DEV: TTY
MESSAGE: : THIS IS A TEST.
SHORT
SEQ TIME 15-Sep-80
37. 10:40:12 SYSTEM LOG ENTRY BY MASELL FOR DEVICE TTY ON TTY # 502
MESSAGE: : THIS IS A TEST.
5.2.11 Software Requested Data
At certain times during system operation, some problems can arise that
are not easily understood. Most frequently, the source of the failure
is a hardware failure but the failure is detected by the software. In
order to troubleshoot this type of failure, you may require additional
data from the monitor. You can obtain this information by patching
the monitor to collect the information at the proper point and passing
it to the system event file for listing.
5-15
ENTRY DESCRIPTIONS
CAUTION
Patching a monitor can easily produce drastic,
undesired results such as loss of customer data,
system crashes, and so forth. Be EXTREMELY CAREFUL
and enlist the help of someone who is familiar with
the monitor structure and internal workings.
SPEAR lists the information in this entry in octal and sixbit.
***********************************************
SOFTWARE REQUESTED DATA
LOGGED ON 4-Jan-81 AT 6:50:34 MONITOR UPTIME WAS 3:13:34
DETECTED ON SYSTEM # 2263.
RECORD SEQUENCE NUMBER: 1.
***********************************************
OCTAL VALUE SIXBIT VALUE
504554,545700 HELLO
675762,544400 WORLD
123456,654321 *<NUC1
654321,123456 UC1*<N
555762,450063 MORE S
517042,516400 IXBIT
5.2.12 Magtape System Error
The monitor records any magtape errors it detects as a Magtape System
Error. Errors that are non-recoverable are classified as HARD,
recoverable errors are classified as SOFT.
If the monitor detects a data channel error, it records the
appropriate information under error code 6 or Data Channel Error.
After a user issues an UNLOAD command or UUO, the monitor records the
performance statistics for the tape, including the total number of
characters transferred and the number of errors (soft read, soft
write, hard read, hard write) encountered.
Note that if someone mounts unlabelled tapes without specifying any
kind of ID, there will be no MEDIA identified in the error file.
5-16
ENTRY DESCRIPTIONS
5-17
ENTRY DESCRIPTIONS
5.2.13 Front End Device Report
You will find a Front End Device Report in the system event file when
the front end passes a packet of error information to the monitor.
This information contains errors detected by the front end and KLCPU
hardware and software. If the device being reported on is unknown to
SPEAR, the entry is reported in octal.
FULL
***********************************************
FRONT END DEVICE REPORT
LOGGED ON 3-Nov-80 AT 9:44:10 MONITOR UPTIME WAS 2 DAYS 14:37:29
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 67.
***********************************************
CPU #,DTE #: 0,0
FE SOFTWARE VER: 0.
DEVICE: KLCPU
STD. STATUS: 100 = ERROR LOG REQUEST,
KL RELOAD STATUS FROM FRONT END: 0 = NO ERROR BITS DETECTED
SHORT
SEQ TIME 3-Nov-80
67. 9:44:10 KLCPU STD STAT=100 RELOAD STAT=0
5.2.14 Front End Reload
The monitor logs a Front End Reload entry into the system event file
when it determines that one of its front ends (attached to a DTE on a
KL10 only) has crashed and has attempted to reload. Before rebooting
the front end, the monitor dumps the crashed front end's core image to
a disk file for later analysis.
5-18
ENTRY DESCRIPTIONS
5.2.15 KS10 Halt Status Block
The monitor records a KS10 Halt Status Block entry into the system
| event file when the KS10 microcode executes a HALT stopcode. A
snapshot of the condition of the system is taken just prior to the
HALT, and this information is written as the entry.
FULL
***********************************************
KS10 HALT STATUS BLOCK
LOGGED ON 9-Feb-81 AT 14:21:55 MONITOR UPTIME WAS 0:01:12
DETECTED ON SYSTEM # 4145.
RECORD SEQUENCE NUMBER: 1.
***********************************************
HALT STATUS CODE: 2
PROGRAM COUNTER: 1000
HALT STATUS BLOCK
MAG: 0,2
PC: 0,1000
HR: 777756,4
AR: 0,0
ARX: 377777,777777
BR: 0,1000
BRX: 254000,1000
ONE: 241200,200000
EBR: 0,1
UBR: 0,31463
MASK: 774777,470177
FLAGS,,PAGE FAIL WORD: 0,1
PI STATUS: 400060,120000
XWD1: 500101,553000
T0: 777777,777777
T1: 4000,0
VMA: 0,177
SHORT
SEQ TIME 9-Feb-81
1. 14:21:55 HALT STATUS CODE = PC = 0,1000 HR = 254000,1000
PAGE FAIL = 4000,0 PI = 0,177 FLAGS,,VMA = 0,0
5.2.16 Magtape Statistics
Each time an UNLOAD UUO or monitor command is given to a tape drive
the monitor creates a Magtape Statistics entry. The same information
is printed in summary form on both the user's terminal and the
operator's terminal (CTY).
In this entry, the REEL IDENTIFICATION is the name supplied to the
monitor at the time the tape was mounted. It has nothing to do with
any label information found on the tape. The CHARS READ is the number
of characters or frames of tape read on this unit since the last
UNLOAD command was issued to this unit. The CHARS WRITTEN is the
number of characters or frames of tape written on this unit since the
last UNLOAD command was issued.
5-19
ENTRY DESCRIPTIONS
FULL
***********************************************
MAGTAPE STATISTICS
LOGGED ON 4-Aug-80 AT 13:40:05 MONITOR UPTIME WAS 1:18:50
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 5.
***********************************************
MAGTAPE STATISTICS
UNIT NAME: MTB261
REEL IDENTIFICATION:
USER'S P,PN: 1,2
CHARS READ: 2720.
CHARS WRITTEN: 0.
SOFT READ ERRORS: 0.
HARD READ ERRORS: 1.
SOFT WRITE ERRORS: 0.
HARD WRITE ERRORS: 0.
SHORT
SEQ TIME 4-Aug-80
5. 13:40:05 MTB261 STATISTICS READ CH/H/S: 2720/1/0 WRITE CH/H/S: 0/0/0
5.2.17 Disk Statistics
This entry reports the performance of each disk unit since the monitor
was loaded. It is useful for computing the disk error rate and disk
throughput. This information is usually not recorded by DAEMON in the
system event file because it takes up a great deal of space.
Installations that want this entry should reassemble DAEMON with the
conditional assembly switch FTUSN set.
The monitor records this entry type for each disk unit on the system
each hour. You can find the same type of information for each monitor
run in the Crash Extract entry (Section 5.2.3).
5-20
ENTRY DESCRIPTIONS
5-21
ENTRY DESCRIPTIONS
5.2.18 DL10 Communications Error
The monitor records a DL10 Communications Error into the system event
file when the DL10 detects an error on the communications link.
FULL
***********************************************
DL10 COMMUNICATIONS ERROR
LOGGED ON 4-Aug-80 AT 16:45:09 MONITOR UPTIME WAS 4:23:54
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 86.
***********************************************
UNIT: DC76
DL10 PORT: 0
ERROR: NO ERROR BITS DETECTED
11 PROGRAM NAME: DC76
CONTROLLER INFORMATION:
CONI DLC: 60,200204 = P1 ENB,
DATAI DLC: 0,750 = NO ERROR BITS DETECTED
CONI DLB (R=0): 0,5037
CONI DLB (R=1): 40000,6005
CONI DLB (R=2): 2000,46401
CONI DLB (R=3): 577777,46400
DATAI DLB (R=1)(MB): 0,0
SHORT
SEQ TIME 4-Aug-80
86. 16:45:09 DL10 ERROR ON PDP11 # 0 CONI DLC = 60,200204
DATAI DLC = 0,750
5.2.19 KL10 Parity or NXM Interrupt
The monitor records a KL10 Parity or NXM Interrupt in the system event
file when the KL10 detects a parity error or an attempt to access a
nonexistent memory location.
The PC AT INTERRUPT is the status of the program counter at the time
of the parity or nonexistent memory interrupt. The CONI PI AT
INTERRUPT is the status of the Priority Interrupt system at the time
of the parity or nonexistent memory interrupt.
5-22
ENTRY DESCRIPTIONS
5.2.20 KS10 NXM Trap
When the KS10 detects a read on a nonexistent memory location, the
monitor records a KS10 NXM Trap into the system event file. A trap
stops execution during the current instruction.
FULL
***********************************************
KS10 NXM TRAP
LOGGED ON 22-Mar-81 AT 0:11:50 MONITOR UPTIME WAS 0:23:18
DETECTED ON SYSTEM # 4608.
RECORD SEQUENCE NUMBER: 1.
***********************************************
ERROR DETECTED ON CPS0
PC AT TRAP: 1,145267
CONI PI AT TRAP: 0,2377
PAGE FAIL WORD: 200013,770000
PAGE FAIL CODE: 20 = I-O NXM
PHYSICAL MEMORY ADDRESS AT TRAP: 0,0
USER'S ID AT TRAP: [307,5515]
USER'S PROGRAM: TSTUBA
# OF RECOVERABLE TRAPS: 0.
# OF NON-RECOVERABLE TRAPS: 0.
SHORT
SEQ TIME 22-Mar-81
1. 0:11:50 NXM TRAP PFW = 200013,770000 PMA = 0,0 NON
RECOVERABLE FAILURE RETRYS: 31
USER AT TRAP [307,5515] RUNNING TSTUBA
5-23
ENTRY DESCRIPTIONS
5.2.21 KL10 or KS10 Parity Trap
The monitor records a KL10 or KS10 Parity Trap when either the KL10 or
KS10 detects an internal parity error, not necessarily in memory.
In this entry, the PHYSICAL MEMORY ADDRESS AT TRAP gives the location
of the parity error where the trap occurred.
FULL
***********************************************
KL10 OR KS10 PARITY TRAP
LOGGED ON 4-Feb-81 AT 17:37:14 MONITOR UPTIME WAS 0:03:13
DETECTED ON SYSTEM # 2136.
RECORD SEQUENCE NUMBER: 1.
***********************************************
ERROR DETECTED ON CPL0
PC AT TRAP: 316000,230
CONI PI AT TRAP: 0,377
PHYSICAL MEMORY ADDRESS AT TRAP: 547001,436241
USER'S ID AT TRAP: [1,2]
USER'S PROGRAM: KLPAR4
PAGE FAIL WORD: 767000,241
PAGE FAIL CODE: 36 = AR
BAD DATA WORD: 252525,252525
GOOD DATA WORD: 0,0
DIFFERENCE: 252525,252525
RECOVERY: CRASH USER
RETRY COUNT:
W CACHE: 4.
W-O CACHE: 0.ERROR DURING CACHE SWEEP TO CORE
# OF RECOVERABLE TRAPS: 0.
# OF NON-RECOVERABLE TRAPS: 3.
SHORT
SEQ TIME 4-Feb-81
1. 17:37:14 PARITY TRAP PFW = 767000,241 PMA = 547001,436241
NON RECOVERABLE FAILURE USER AT TRAP [1,2]
RUNNING KLPAR4 RETRIES: 4
5-24
ENTRY DESCRIPTIONS
5.2.22 Memory Sweep for NXM
When the monitor detects an attempt to access a nonexistent memory
location in user core, it scans core by doing a memory sweep, looking
for more NXMs. The monitor then records the results of this scan as a
Memory Sweep for NXM in the system event file.
The ADDRESSES DETECTED BY SWEEP gives you the locations, if any, of
more attempts to access nonexistent memory locations.
FULL
***********************************************
MEMORY SWEEP FOR NXM
LOGGED ON 1-Oct-80 AT 9:03:14 MONITOR UPTIME WAS 1:02:21
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 3124.
***********************************************
NXM CORE SWEEP TOTALS FOR CPL0
REPRODUCIBLE: 0.
NON-REPRODUCIBLE: 0.
DETECTED BY DATA
CHANNEL BUT NOT
BY CPU: 20.
SWEEP INFORMATION:
ERRORS DETECTED: 0.
LOGICAL "AND" OF BAD
PHYSICAL ADDRESSES: 777777,777777
LOGICAL "OR" OF BAD
PHYSICAL ADDRESSES: 0,0
MEMORY PLACED OFF-LINE:
SHORT
SEQ TIME 1-Oct-80
3124. 9:03:14 NXM SWEEP ON CPL0 # OF ERRORS SEEN = 0
5-25
ENTRY DESCRIPTIONS
5.2.23 Memory Sweep for Parity
When the monitor detects a parity error on a read attempt, it sweeps
memory looking for more of the same. The results of the sweep are
recorded in the system event file as a Memory Sweep for Parity.
The SWEEP INFORMATION contains the number of words found with bad
parity. It also contains the logical AND and logical OR of the bad
addresses and bad contents.
FULL
***********************************************
MEMORY SWEEP FOR PARITY
LOGGED ON 4-Nov-80 AT 8:39:53 MONITOR UPTIME WAS 0:35:34
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 2026.
***********************************************
DATA PARITY CORE SWEEP TOTALS FOR CPL0
REPRODUCIBLE: 0.
NON-REPRODUCIBLE: 0.
USER ENABLED: 0.
CORE SWEEPS: 1.
DETECTED BY DATA
CHANNEL BUT NOT
BY CPU: 1.
SWEEP INFORMATION:
ERRORS DETECTED: 0.
LOGICAL "AND" OF BAD
PHYSICAL ADDRESSES: 777777,777777
LOGICAL "OR" OF BAD
PHYSICAL ADDRESSES: 0,0
LOGICAL "AND" OF BAD DATA: 777777,777777
LOGICAL "OR" OF BAD DATA: 0,0
SHORT
SEQ TIME 4-Nov-80
2026. 8:39:53 DATA PARITY CORE SWEEP FOR CPL0 # OF ERRORS SEEN = 0
5.2.24 CPU Status Block
The monitor records this entry into the system event file after
recovering from a system crash. At the time of the crash, a snapshot
is taken of the condition of all the components of the CPU (such as
controllers, channels, RH20s, the pager, and so forth). When the
system recovers, the monitor extracts this information from the
CRASH.EXE file and places it in the system event file as a CPU Status
Block.
This entry contains the condition of the registers and channels just
prior to the crash. Also, the SBDIAG FUNCTIONS column contains the
SBUS diagnostic functions.
5-26
ENTRY DESCRIPTIONS
FULL
***********************************************
** THIS ENTRY COPIED FROM A SAVED CRASH **
CPU STATUS BLOCK
LOGGED ON 5-Aug-80 AT 0:11:25 MONITOR UPTIME WAS 11:50:09
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 185.
***********************************************
APRID = 231,342002
CONI APR = 7760,3
RDERA = 604000,7427
CONI PI = 0,10377
DATAI PAG = 701100,3
CONI PAG = 0,620001
CONI RH0 THRU RH7
000000,,002445 000000,,006400 000000,,002445 000000,,002445
000000,,000000 000000,,000000 000000,,000000 000000,,000000
CONI DTE0 THRU DTE3
000000,,020014 000000,,100000 000000,,100014 000000,,100014
EPT LOCATIONS 0 THRU 37 (CHANNEL LOGOUT AREA)
200000,,000454 500000,,000456 600000,,000000 000000,,000000
000000,,000000 000000,,000000 000000,,000000 000000,,000000
200000,,000454 500000,,000455 600001,,457000 000000,,000000
200000,,000454 500000,,000455 600001,,014660 000000,,000000
000000,,000000 000000,,000000 000000,,000000 000000,,000000
000000,,000000 000000,,000000 000000,,000000 000000,,000000
000000,,000000 000000,,000000 000000,,000000 000000,,000000
000000,,000000 000000,,000000 000000,,000000 000000,,000000
EPT LOCATIONS 140 THRU 177 (DTE CONTROL BLOCKS)
141000,,413160 241000,,223676 264000,,057516 000000,,000000
000000,,000442 000000,,057054 000000,,000030 000000,,057136
000000,,000000 000000,,000000 264000,,057556 000000,,000000
000000,,000443 000000,,057053 000000,,000030 000000,,057166
241000,,224302 341000,,224563 264000,,057616 000000,,000000
000000,,000444 000000,,057052 000000,,000030 000000,,057216
341000,,232743 141000,,224000 264000,,057656 000000,,000000
000000,,000445 000000,,057051 000000,,000030 000000,,057246
UPT LOCATIONS 424 THRU 427 (UUO AREA)
000000,,000000 000000,,000000 000000,,000000 000000,,000000
UPT LOCATIONS 500 THRU 503 (PAGE FAIL AREA)
000000,,000000 304000,,112667 004000,,566102 000000,,000000
AC BLOCK 6 LOCATIONS 0 THRU 3 AND 12
000000,,000000 000000,,000000 000000,,000000 000000,,000000
000000,,000000
AC BLOCK 7 LOCATIONS 0 THRU 2
255000,,000000 000000,,640010 000000,,000000
SBDIAG FUNCTIONS
CTRLR FUNCTION 0 FUNCTION 1
4 005740,,041736 000200,,000000
SHORT
SEQ TIME 5-Aug-80
185. 0:11:25 CPU STATUS BLOCK APRID = 231,342002 CONI APR = 7760,3
CONI PI = 0,10377 CONI PAG = 0,620001
DATAI PAG = 701100,3
5-27
ENTRY DESCRIPTIONS
5.2.25 Device Status Block
The monitor records this entry into the system event file after
recovering from a system crash. At the time of the crash, a snapshot
is taken of the condition of all the I/O devices (such as
lineprinters, cardreaders, disk drives, and so forth). When the
system recovers, the monitor extracts this information from the
CRASH.EXE file and places it in the system event file as a Device
Status Block.
FULL
***********************************************
** THIS ENTRY COPIED FROM A SAVED CRASH **
DEVICE STATUS BLOCK
LOGGED ON 5-Aug-80 AT 0:11:25 MONITOR UPTIME WAS 11:50:09
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 186.
***********************************************
CONI 20 : 117,63202
CONI 24 : 0,32003
CONI 120 : 0,0
CONI 104 : 0,0
CONI 100 : 0,0
CONI 240 : 0,0
CONI 320 : 0,410000
CONI 324 : 770010,4100
CONI 150 : 3,0
CONI 124 : 0,2400
CONI 140 : 0,40
CONI 344 : 0,0
CONI 340 : 0,0
CONI 220 : 1,420004
CONI 170 : 0,0
CONI 174 : 0,0
CONI 270 : 0,0
CONI 274 : 4000,5
CONI 360 : 0,0
CONI 250 : 0,0
CONI 254 : 0,0
CONI 260 : 0,0
CONI 264 : 0,0
CONI 334 : 0,0
CONI 330 : 0,0
CONI 64 : 60,200224
CONI 60 : 0,5037
CONI 164 : 0,0
CONI 160 : 0,0
CONI 110 : 0,400000
CONI 154 : 2,0
CONI 234 : 0,0
CONI 230 : 307620,32400
CONI 144 : 0,0
DATAI 0 : 0,0
DATAI 170 : 0,0
DATAI 174 : 0,0
DATAI 270 : 0,0
5-28
ENTRY DESCRIPTIONS
DATAI 274 : 4003,3
DATAI 360 : 0,0
DATAI 250 : 0,0
DATAI 254 : 0,0
DATAI 260 : 0,0
DATAI 264 : 0,0
DATAI 64 : 0,770
DATAI 60 : 0,162
DATAI 164 : 0,0
DATAI 160 : 0,0
SHORT
SEQ TIME 5-Aug-80
186. 0:11:25 DEVICE STATUS BLOCK
|
|
|
| 5.2.26 Line printer Error
|
The monitor records any errors detected by the LP100 controller as a
| Line printer Error in the system event file. Note that if the line
| printer is taken off-line to add paper or change forms, the monitor
does not record this event.
The LAST DATA WORD SENT can help to determine the location of a data
parity error, if one exists. Also, the CONI AT ERROR text translation
contains significant error bits to describe the mode of operation when
the failure occurred.
FULL
***********************************************
LINE PRINTER ERROR
LOGGED ON 22-Mar-81 AT 0:11:50 MONITOR UPTIME WAS 0:23:18
DETECTED ON SYSTEM # 1536.
RECORD SEQUENCE NUMBER: 1.
***********************************************
UNIT NAME: LPT0
CONTROLLER TYPE: LP100
LAST DATA WORD SENT: 0,123
CONI AT ERROR: 200045,226465 = NOT READY,VFU ERROR,OFF LINE,
VFU TYPE: DIRECT ACCESS
CHARACTER SET: VARIABLE
PAGE COUNTER: 37.
SHORT
SEQ TIME 22-Mar-81
1. 0:11:50 LPT0 LP100 ERROR CONI LP = 200045,226465
5-29
ENTRY DESCRIPTIONS
5.2.27 Unit Record Error
The monitor logs a Unit Record Error into the system event file when
| it detects an error on a unit-record device such as a line printer, a
| card reader, a card punch, or a plotter.
FULL
***********************************************
UNIT RECORD ERROR
LOGGED ON 8-Sep-80 AT 12:06:44 MONITOR UPTIME WAS 3:58:38
DETECTED ON SYSTEM # 1026.
RECORD SEQUENCE NUMBER: 314.
***********************************************
UNIT NAME: LPT262
CONTROLLER TYPE: LP100
DEVICE TYPE: LPT
USER ID: [1,2]
PROGRAM NAME: LPTSPL
VFU TYPE: DAVFU
CHARACTER SET: 96 CHARACTER
CONI AT ERROR: 307216,632444 NOT READY,VFU ERROR,OFF LINE,
LAST DATA WD: 0,0
SHORT
SEQ TIME 8-Sep-80
314. 12:06:44 LPT262 ERROR FOR USER [1,2] RUNNING LPTSPL
CONI LP100 = 307216,632444
5.3 TOPS-20 ENTRIES
The following sections list both the FULL and SHORT versions of the
entries that TOPS-20 can record in its system event file. Note that
the network entries for DECnet-20 version 2.1 are listed separately in
Section 5.4. Network entries for DECnet-20 version 3.0 are listed in
Section 5.5
5.3.1 TOPS-20 System Reloaded
Every time the monitor is loaded a TOPS-20 System Reloaded entry is
written into the system event file, explaining why the system was
reloaded. If the system is on auto-reload and a BUGHLT occurs, the
BUGHLT address is listed and the TOPS-20 BUGHLT-BUGCHK entry, Section
5.3.2, is also written into the system event file.
5-30
ENTRY DESCRIPTIONS
FULL
***********************************************
TOPS-20 SYSTEM RELOADED
LOGGED ON Mon 23 Jun 80 08:46:31 MONITOR UPTIME WAS 0:00:22
DETECTED ON SYSTEM # 2116.
RECORD SEQUENCE NUMBER: 22.
***********************************************
CONFIGURATION INFORMATION
SYSTEM NAME: System 2116 TOPS-20 Monitor 4(3230)
MONITOR BUILT ON: Wed 28 Nov 79 11:00:01
CPU SERIAL #: 2116.
MONITOR VERSION: 4(3230)
U-CODE VERSION: 0
RELOAD BREAKDOWN:
SHORT
SEQ TIME Mon 23 Jun 80
22. 08:46:31 RELOAD OF System 2116 The Big Orange Welcomes You, TOPS-20
Monitor 4(3230) VERSION 4(3230)
BUILT ON Wed 28 Nov 79 11:00:01 REASON
5.3.2 TOPS-20 BUGCHKs and BUGHLTs
When the monitor detects a BUGHLT, BUGCHK, or BUGINF, monitor software
error, it records a TOPS-20 BUGHLT-BUGCHK entry into the system event
file. The most serious of the three errors is a BUGHLT, which crashes
the system. At this point, something is seriously wrong, and the
monitor does not have enough integrity to attempt any further error
recovery. The monitor does, however, collect pertinent information
for error recording. When the system is reloaded, the information is
extracted from a crash dump and recorded in the system event file.
BUGCHK and BUGINF are less serious, perhaps correctable,
monitor-detected errors that can affect only particular users instead
of the entire system. These errors may or may not crash the system
depending on the error that occurs.
The number of errors since reload is included in this entry because
only five occurrences of this entry type are allowed in the monitor's
error recording buffer at any one time. In the case of an error
occurring in a tight loop, more than five entries could overflow the
buffer, and the information for the first occurrence might be lost.
These numbers should increment by one for each entry; however, if the
sequence is broken, it indicates that more than five entries occurred
before the error-logger module of the monitor could empty the buffer.
The FORK # and JOB # in the entry are the numbers associated with the
current user at the time of the error. A value of -1 or 777777
indicates that the monitor was performing an overhead function (such
as scheduling) and that there was no current user. Note that the FORK
# and JOB # indicate the current user, and not necessarily the user
being serviced by the monitor interrupt-level routines.
All BUGHLTs now reside in a monitor module, BUGS.MAC. This module
includes a description of what might have caused the BUGHLT and also
5-31
ENTRY DESCRIPTIONS
some corrective action that you can take.
5-32
ENTRY DESCRIPTIONS
FULL
***********************************************
TOPS-20 BUGHLT-BUGCHK
LOGGED ON Mon 16 Jun 80 11:10:19 MONITOR UPTIME WAS 3:10:48
DETECTED ON SYSTEM # 2137.
RECORD SEQUENCE NUMBER: 25.
***********************************************
ERROR INFORMATION:
DATE-TIME OF ERROR: Mon 16 Jun 80 11:10:09
# OF ERRORS SINCE RELOAD: 1.
FORK # & JOB #: 72,0
USER'S LOGGED IN DIR: OPERATOR
PROGRAM NAME: SYSJOB
ERROR: BUGINF
ADDRESS OF ERROR: 644111
NAME: DN20ST
DESCRIPTION: DTESRV- DN20 STOPPED
CONI APR: 7740,3 = NO ERROR BITS DETECTED
CONI PAG: 0,660132
DATAI PAG: 700100,1246
CONTENTS OF AC'S:
0: 0,0
1: 777775,1
2: 0,1
3: 0,0
4: 0,0
5: 0,0
6: 0,0
7: 0,0
10: 0,0
11: 0,0
12: 0,0
13: 0,0
14: 0,0
15: 0,0
16: 60000,0
17: 777505,335504
PI STATUS: 0,177
ADDITIONAL DATA ITEMS: 1
0,1
ERA: 602000,5504 = WD #3 MEMORY READ
BASE PHY. MEM ADDR.
AT FAILURE: 5504
SHORT
SEQ TIME Mon 16 Jun 80
25. 11:10:19 BUGINF DN20ST AT Mon 16 Jun 80 11:10:09 USER OPERATOR
RUNNING SYSJOB CONI APR= 7740,3 CONI PAG= 0,660132
ERA= 602000,5504
5.3.3 MASSBUS Device Error
Every time the monitor detects an error in the MASSBUS system a
MASSBUS Device Error is recorded in the system event file. The
5-33
ENTRY DESCRIPTIONS
MASSBUS system includes the MASSBUS devices RP04, RP05, RP06, TU45,
and RM03; the RH20 controller (RH11 and UBA for 2020); and certain
errors occurring in the channel logic.
5-34
ENTRY DESCRIPTIONS
The unit name in this entry refers to the physical MASSBUS unit active
at the time of the error. This is a 5-character name in the format:
xxabc
where
xx is the device type DP (disk pack) or MT (magtape) For
example, DP220 refers to disk pack 220.
a is the logical address of the RH20 controller for this
device (0-7) - RH11 and UBA in a 2020 configuration.
b is the logical MASSBUS address for this device (0-7) For
magtape units, this is the TM02 address on the MASSBUS.
c is the slave number of a magnetic tape unit. For RP04s,
RP05s, and RP06s, this number is always 0.
The following is a MASSBUS Device Error from an RP07 disk
drive:
5-35
ENTRY DESCRIPTIONS
5-36
ENTRY DESCRIPTIONS
The following MASSBUS Device Error is from a TU78 magnetic tape drive:
FULL
***********************************************
MASSBUS DEVICE ERROR
LOGGED ON Mon 31 Aug 81 15:42:02 MONITOR UPTIME WAS 0:08:46
DETECTED ON SYSTEM # 2137.
RECORD SEQUENCE NUMBER: 161.
***********************************************
UNIT NAME: MT000
UNIT TYPE: TU78
UNIT SERIAL #: 0175.
VOLUME ID:
LOCATION: RECORD # 1. OF FILE # 0.
USER'S LOGGED IN DIR NUMBER: 5
USER'S PGM: SYSJOB
OPERATION AT ERROR: DEV.AVAIL. GO + READ FWD(70)
FINAL ERROR STATUS: 0,0
RETRIES PERFORMED: 0.
ERROR: NON-RECOVERABLE
DRIVE EXCEPTION,CHN ERROR, IN CONTROLLER CONI
M8960 u-CODE REVISION LEVELS:
0 ( 0- 3777) 005
1 ( 4000- 7777) 005
2 (10000-13777) 005
3 (14000-17777) 003
4 (20000-23777) 002
5 (24000-27777) 003
6 (30000-33777) 007
7 (34000-37777) 003
5-37
ENTRY DESCRIPTIONS
CONTROLLER INFORMATION:
CONTROLLER: RH20 # 0
CONI AT ERROR: 0,222415 =
DRIVE EXCEPTION,CHN ERROR,
CONI AT END: 0,222415 =
DRIVE EXCEPTION,CHN ERROR,
DATAI PTCR AT ERROR: 732200,177771
DATAI PTCR AT END: 732200,177771
DATAI PBAR AT ERROR: 720000,113000
DATAI PBAR AT END: 720000,113000
CHANNEL INFORMATION:
CHAN STATUS WD 0: 200000,272774
CW1: 0,0 CW2: 0,0
CHN STATUS WD 1: 540100,272775 =
NOT SBUS ERR,NOT WC = 0,LONG WC ERR,
CHN STATUS WD 2: 420003,170000
DEVICE REGISTER INFORMATION:
AT ERROR AT END DIFF.
CMD 00: 4070 4070 0
DEV.AVAIL. READ FWD(70)
DST 01: 4415 4415 0
| Interrupt code: NOT CAPABLE
| ID Burst neither PE or GCR
CNT 02: 30004 30004 0
SKIP COUNT = 0. RECORD COUNT = 1. DRIVE # 0
DG1 03: 0 0 0
ATN 04: 0 0 0
BCT 05: 113000 113000 0
38400. BYTES
DTR 06: 142101 142101 0
STA 07: 166200 166200 0
RDY, PRES, ONL, PE, BOT, AVAIL,
SER 10: 565 565 0
DG2 11: 0 0 0
DG3 12: 0 0 0
NST 13: 1 1 0
Interrupt code: DONE
Extended sense data not updated
NC1 14: 406 406 0
CMD COUNT = 1. Rewind(06)
NC2 15: 10 10 0
CMD COUNT = 0. Sense(10)
NC3 16: 10 10 0
CMD COUNT = 0. Sense(10)
NC4 17: 10 10 0
CMD COUNT = 0. Sense(10)
MPA 20: 2034 2034 0
MPD 21: 100000 100000 0
EXTENDED SENSE BYTE DATA NOT SUPPLIED FOR THIS ENTRY
5-38
ENTRY DESCRIPTIONS
DEVICE STATISTICS AT TIME OF ERROR:
# OF READS: 0. # OF WRITES: 0. # OF SEEKS: 0.
# SOFT READ ERRORS: 0. # SOFT WRITE ERRORS: 0.
# HARD READ ERRORS: 1. # HARD WRITE ERRORS: 0.
# SOFT POSITIONING ERRORS: 0.
# HARD POSITIONING ERRORS: 0.
# OF MPE: 0. # OF NXM: 0. # OF OVERRUNS: 0.
| The soft read errors and hard read errors in this entry are counted as
| of the last volume mount.
SHORT
161. 15:42:02 MT000 TU78 SERIAL #0175. OPERATOR RUNNING SYSJOB
CONI RH= 0,222415 CHN STS= 540100,272775 SR= 0,4415
ER= 0,30004 FILE/RECORD 0./1.
5.3.4 DX20 Device Error
When the monitor detects an error in any portion of the MASSBUS system
connected to the DX20 tape controller, the DX20 Device Error is
recorded in the system event file.
This entry contains the octal values of the CONI and DATAI from the
controller both when the error was first detected and after the last
retry.
5-39
ENTRY DESCRIPTIONS
5-40
ENTRY DESCRIPTIONS
5.3.5 Drive Statistics Entries
Drive Statistics Entries are written into the system event file to
record the activity on the drive. For example, mounts and dismounts,
reloads, and drive shutdowns are information that is recorded as a
drive statistic.
5-41
ENTRY DESCRIPTIONS
FULL
***********************************************
DRIVE STATISTICS ENTRIES
LOGGED ON 5-Oct 10:52:28 MONITOR UPTIME WAS 367.
DETECTED ON SYSTEM # 2137.
RECORD SEQUENCE NUMBER: 361.
***********************************************
Volume ID: SPARE Reason recorded: Disk pack mount
Channel info(CDB): RH20 # 4 on PI level 5
Device info(UDB): RP20, DP401 PIA: 0
READS WRITES SEEKS
TOTAL : 8. 1.
***********************************************
DRIVE STATISTICS ENTRIES
LOGGED ON 5-Oct 11:20:24 MONITOR UPTIME WAS 5454.
DETECTED ON SYSTEM # 2137.
RECORD SEQUENCE NUMBER: 374.
***********************************************
Volume ID: CDM Reason recorded: Magtape unload
Channel info(CDB): RH20 # 3 on PI level 5
Device info(UDB): TU70, MTA1, MT301 PIA: 0
READS WRITES
TOTAL : 353600. 7610560.
NRZI :
PE : 353600. 7610560.
GCR :
SHORT
361. 10:52:28 STATS DRIVE: DP401 VOLID: SPARE REASON: Disk pack mount.
374. 11:20:24 STATS DRIVE: MT301 VOLID: CDM REASON: Magtape unload.
5-42
ENTRY DESCRIPTIONS
5.3.6 Configuration Status Change
The monitor records a Configuration Status Change when the system
operator takes disk units and/or sections of core memory on-line or
off-line, thus changing the configuration of the system. The system
operator can give a 2-character reason for the change in
configuration. The following codes are suggested:
PM - preventive maintenance
CM - corrective maintenance
DN - unit is down
OT - other
This entry lists what device was affected, what action was taken, and
where the action was performed (channel number, controller number,
unit number).
CAUTION
When the system operator adds memory to the system,
the monitor checks to verify the availability of the
specified addresses. Mistakes are reported to the
operator at the operator's terminal, CTY; however, the
error-logging system treats these as valid NXMs and
records them as NXM entries. You can identify a NXM
entry of this type by the fact that no physical memory
is off-line and the user's directory is [1,2].
FULL
***********************************************
CONFIGURATION STATUS CHANGE
LOGGED ON Mon 23 Jun 80 08:50:21 MONITOR UPTIME WAS 2 DAYS 8:34:54
DETECTED ON SYSTEM # 2137.
RECORD SEQUENCE NUMBER: 1.
***********************************************
DETACH TU72 S/N:28410
AS MTA2 AT CHANNEL #0 CONTROLLER #0 UNIT #2
REASON:
SHORT
SEQ TIME Mon 23 Jun 80
1. 08:50:21 DETACH TU72 S/N:28410 AS MTA2 AT CHANNEL #0 CONTROLLER #0
UNIT #2 REASON:
5-43
ENTRY DESCRIPTIONS
5.3.7 System Log Entry
The monitor records a System Log Entry when the system operator enters
a log entry into the system event file with the OPR program.
A system operator, or anyone with operator privileges, can make an
entry into the system event file by doing the following:
1. Run the OPR program
@OPR<RET>
OPR>
2. When you see the prompt, specify the REPORT command:
OPR>REPORT<RET>
3. Use the following syntax:
OPR>REPORT user text <RET>
where user can be directory name and/or device name, and text
can be a single-line or multiple-line response.
For more information on OPR, refer to the TOPS-20 Operator's Command
Language Reference Manual.
FULL
***********************************************
SYSTEM LOG ENTRY
LOGGED ON Tue 1 Jul 80 11:37:37 MONITOR UPTIME WAS 0:09:48
DETECTED ON SYSTEM # 2116.
RECORD SEQUENCE NUMBER: 32.
***********************************************
ENTRY CREATED BY:
JOB #, TTY #: 11,17
DIRECTORY: SCHMITT
WHO: SCHMIT
DEV: NUL
MESSAGE: : testing
SHORT
SEQ TIME Tue 1 Jul 80
32. 11:37:37 SYSTEM LOG ENTRY BY SCHMIT FOR DEVICE NUL ON TTY # 17
MESSAGE: : testing
5.3.8 Front-End Device Report
| You find a Front-End Device Report in the system event file when the
front end passes a packet of error information to the monitor across
the DTE-20. This information contains errors detected by the front
end and KLCPU hardware and software. Currently, entries are created
for the following devices: LP20, CD20, DH11, KLCPU, KLERROR, and
KLINIK.
5-44
ENTRY DESCRIPTIONS
If the FORK # and JOB # associated with the error are 777777,777777,
this indicates that the TOPS-20 monitor knows of this device but it is
not currently assigned to any fork or job. If the FORK # and JOB #
are 777776,777776, this indicates that the monitor does not know
anything about this device.
The front end generates a standard-status word for each transfer
across the DTE-20. The ERROR LOG REQUEST bit in this word causes the
packet to be recorded into the system event file.
The information in the entry varies depending on the type of device
being reported on. If SPEAR does not know how to list a device, this
fact is stated in the entry, listed in octal.
5.3.9 Front End Reloaded
Each time the KLCPU detects that the front end has halted or is in a
loop a Front End Reloaded entry is recorded in the system event file.
The KL attempts to copy a crash dump file onto disk from the front
end's memory and then reboots the front end.
The front-end number is the logical address of the front end and
indicates whether this front end is privileged. The status at reload
describes, in text, any errors that occurred during the reboot
process. The file name of the core dump is listed if the crash dump
was successful.
5-45
ENTRY DESCRIPTIONS
FULL
***********************************************
FRONT END RELOADED
LOGGED ON Tue 1 Jul 80 00:18:51 MONITOR UPTIME WAS 0:02:24
DETECTED ON SYSTEM # 2102.
RECORD SEQUENCE NUMBER: 126.
***********************************************
FRONT END #: 0
STATUS AT RELOAD: NO ERROR BITS DETECTED
RETRIES: 0
REASON FOR RELOAD: B03
FILENAME FOR DUMP: <SYSTEM>0DUMP11.BIN.17, 1-Jul-80 00:18:45
SHORT
SEQ TIME Tue 1 Jul 80
126. 00:18:51 FRONT END RELOAD ON PDP11 #0 RELOAD STATUS,,RETRIES 0,0
PDP11 HALT CODE B03
5.3.10 Processor Parity Trap
The monitor records a Processor Parity Trap each time a page-fail trap
occurs in the CPU as a result of an AR, ARX, or PAGE TABLE parity
error.
The information contained in the GOOD DATA WORD is valid only if the
error is recoverable; otherwise, the data is 0,0 and the DIFFERENCE
DATA is a copy of the BAD DATA WORD. The DIFFERENCE is the result of
an XOR between the bad data and the good data words. Note that if the
user is unknown, 777777,777777 will be the FORK and JOB numbers.
FULL
***********************************************
PROCESSOR PARITY TRAP
LOGGED ON Tue 8 Jul 80 11:14:04 MONITOR UPTIME WAS 8:51:58
DETECTED ON SYSTEM # 2102.
RECORD SEQUENCE NUMBER: 320.
***********************************************
STATUS AT ERROR:
BAD DATA DETECTED BY: AR
PAGE FAIL WD AT TRAP: 763000,313
BAD DATA WORD: 252525,252525
GOOD DATA WORD: 525252,525252
DIFFERENCE: 777777,777777
PHYSICAL MEM ADDR.
AT FAILURE: 563003,277313
RECOVERY: CONT. USER
RETRY COUNT: 1.
CACHE IN USE
FORK # & JOB #: 53,17
USER'S LOGGED IN DIR: EIBEN
PROGRAM NAME: KLPAR1
5-46
ENTRY DESCRIPTIONS
SHORT
SEQ TIME Tue 8 Jul 80
320. 11:14:04 PARITY TRAP PAGE FAIL WORD;763000,313
PHYSICAL MEMORY ADDRESS;563003,277313
FAILURE TYPE,,RETRIES;40000,1
5.3.11 Processor Parity Interrupt
When the monitor detects an APR interrupt because of a parity error,
it records a Processor Parity Interrupt in the system event file. It
records the entry after it has scanned all physical memory looking for
more errors. If the original error also generates a page-fail trap,
the monitor also creates a Processor Parity Trap entry.
The CONI APR and ERA values are the contents of these registers at the
time of the first error. The PC AT INTERRUPT value includes the flags
in the left half. The BASE PHYsical MEMory ADDRess AT FAILURE is from
the right half of the contents of the ERA.
The # OF ERRORS on this sweep refers to the number of parity errors
during this sweep of physical memory. If the value is zero, the
monitor did not detect any errors, and 777777,777777 is the logical
AND function for both bad addresses and bad data. The logical OR
function, in this case, is 0,0.
The SYSTEM MEMORY CONFIGURATION lists the physical memory
configuration and any detected errors at the time of the first error.
These are the results of S-BUS DIAGNOSTIC FUNCTIONS for all memory
controllers on this CPU.
FULL
***********************************************
PROCESSOR PARITY INTERRUPT
LOGGED ON Tue 8 Jul 80 11:21:35 MONITOR UPTIME WAS 8:59:29
DETECTED ON SYSTEM # 2102.
RECORD SEQUENCE NUMBER: 323.
***********************************************
CONI APR: 7740,413 = MB PAR ERR,
ERA: 36001,520314 = WD #0 CACHE WRITE
BASE PHY. MEM ADDR.
AT FAILURE: 1520314
PC FLAGS AT INTERRUPT: 300000,0
PC AT INTERRUPT: 67320
# ERRORS ON THIS SWEEP 2.
LOGICAL AND OF
BAD ADDRESSES: 1,520304
LOGICAL OR OF
BAD ADDRESSES: 1,520314
LOGICAL AND OF
BAD DATA: 252525,252525
LOGICAL OR OF
BAD DATA: 252525,252525
SYSTEM MEMORY CONFIGURATION:
5-47
ENTRY DESCRIPTIONS
CONTROLLER: #0 MB20 128 K
F0: 6000,0 F1: 36300,36012
INTERLEAVE MODE: 4-WAY
REQ ENABLED: 0 2
LOWER ADDRESS BOUNDARY: 0
UPPER ADDRESS BOUNDARY: 777777
ERRORS DETECTED: NONE
CONTROLLER: #1 MB20 128 K
F0: 6000,0 F1: 36300,36005
INTERLEAVE MODE: 4-WAY
REQ ENABLED: 1 3
LOWER ADDRESS BOUNDARY: 0
UPPER ADDRESS BOUNDARY: 777777
ERRORS DETECTED: NONE
CONTROLLER: #2 MB20 128 K
F0: 6000,0 F1: 36301,36012
INTERLEAVE MODE: 4-WAY
REQ ENABLED: 0 2
LOWER ADDRESS BOUNDARY: 1000000
UPPER ADDRESS BOUNDARY: 1777777
ERRORS DETECTED: NONE
CONTROLLER: #3 MB20 128 K
F0: 6000,0 F1: 36301,36005
INTERLEAVE MODE: 4-WAY
REQ ENABLED: 1 3
LOWER ADDRESS BOUNDARY: 1000000
UPPER ADDRESS BOUNDARY: 1777777
ERRORS DETECTED: NONE
CONTROLLER: #10 MF20
F0: 26123,277313 F1: 500,1000
LAST WORD REQUEST: RQ3 WRITE
LAST ADDRESS HELD: 3277313
CONTROLLER STATUS: SF2 & SF1= 2
ERRORS DETECTED: WRITE PARITY
CONTROLLER: #11 MF20
F0: 7747,631734 F1: 500,1000
LAST WORD REQUEST: RQ0RQ1RQ2RQ3- READ
LAST ADDRESS HELD: 7631734
CONTROLLER STATUS: SF2 & SF1= 2
ERRORS DETECTED: NONE
ERRORS DETECTED DURING SWEEP:
ADDRESS BAD DATA GOOD DATA DIFFERENCE
1520304 252525,252525 GOOD DATA NOT FOUND
1520314 252525,252525 GOOD DATA NOT FOUND
SHORT
SEQ TIME Tue 8 Jul 80
323. 11:21:35 PARITY INTERRUPT-CONI APR;7740,413 ERA;36001,520314
PC AT INTERRUPT;0,67320 # OF ERRORS;2.
5.3.12 KL CPU Status Block
This entry is written into ERROR.SYS on TOPS-20, if KLSTAT is turned
on at the time of a system crash. (See Section 4.5.1 for this
procedure.)
5-48
ENTRY DESCRIPTIONS
At the time of a crash, a snapshot of the condition of all the
components of the CPU (such as controllers, channels, RH20s, the
pager, and so forth) is taken. When the system recovers, this
information is extracted from the CRASH.EXE file and written as an
entry in ERROR.SYS. This entry displays the condition of the
registers and channels at the time of the crash.
FULL
***********************************************
KL CPU STATUS BLOCK
LOGGED ON Mon 15 Sep 80 15:03:19 MONITOR UPTIME WAS 17:49:02
DETECTED ON SYSTEM # 2137.
RECORD SEQUENCE NUMBER: 26.
***********************************************
APRID = 600236,364131
CONI APR = 7740,3
RDERA = 202000,132276
CONI PI = 0,2377
DATAI PAG = 701000,3201
CONI PAG = 0,660124
CONI RH0 THRU RH7
000000,,002445 000000,,002445 000000,,002445 000000,,002445
000000,,002000 000000,,002000 000000,,002000 000000,,002000
CONI DTE0 THRU DTE3
000000,,001016 000000,,101016 000000,,002000 000000,,002000
EPT LOCATIONS 0 THRU 37 (CHANNEL LOGOUT AREA)
200000,,225566 540100,,225567 620003,,477000 254340,,726001
200000,,074442 500000,,074443 600000,,460000 254340,,726421
200000,,075064 500000,,075065 600001,,053000 254340,,727011
200000,,075522 500000,,075523 600001,,573000 254340,,727501
000000,,000000 000000,,000000 000000,,000000 000000,,000000
000000,,000000 000000,,000000 000000,,000000 000000,,000000
000000,,000000 000000,,000000 000000,,000000 000000,,000000
000000,,000000 000000,,000000 000000,,000000 000000,,000000
EPT LOCATIONS 140 THRU 177 (DTE CONTROL BLOCKS)
241000,,223711 241000,,730250 254340,,002135 000000,,000000
000000,,000000 000000,,223434 000000,,000030 000000,,223516
000000,,000000 041000,,731556 254340,,002147 000000,,000000
000000,,000226 000000,,223433 000000,,000030 000000,,223546
000000,,000000 000000,,000000 000000,,000000 000000,,000000
000000,,000000 000000,,000000 000000,,000000 000000,,000000
000000,,000000 000000,,000000 000000,,000000 000000,,000000
000000,,000000 000000,,000000 000000,,000000 000000,,000000
UPT LOCATIONS 424 THRU 427 (UUO AREA)
310100,,057200 000000,,700000 000000,,000000 601000,,003201
UPT LOCATIONS 500 THRU 503 (PAGE FAIL AREA)
411000,,742000 000000,,000162 000006,,611327 000000,,027543
AC BLOCK 6 LOCATIONS 0 THRU 3 AND 12
000770,,000007 301000,,002520 000000,,127000 000000,,153764
011003,,276223
AC BLOCK 7 LOCATIONS 0 THRU 2
000000,,000000 000000,,000000 000000,,000000
SBDIAG FUNCTIONS
CTRLR FUNCTION 0 FUNCTION 1
0 006000,,000000 036300,,036012
1 006000,,000000 036300,,036005
10 007743,,201500 000500,,001000
5-49
ENTRY DESCRIPTIONS
SHORT
SEQ TIME Mon 15 Sep 80
26. 15:03:19 KL CPU STATUS BLOCK APRID = 600236,364131
CONI APR = 7740,3 RDERA = 202000,132276
CONI PAG = 0,660124 DATAI PAG = 701000,3201
5.3.13 MF20 Device Report
This entry is written to ERROR.SYS when a MOS memory error occurs. A
program called TGHA is called by the monitor every time a MOS memory
error occurs. TGHA is responsible for recovering from the error. If
TGHA places memory off-line or substitutes a spare bit, these events
are recorded as an entry in ERROR.SYS. The TGHA entry is actually an
ASCII text report describing the attempt to recover from an error in
MOS memory.
FULL
***********************************************
MF20 DEVICE REPORT
LOGGED ON Mon 30 Jun 80 10:02:41 MONITOR UPTIME WAS 1 DAY 11:39:06
DETECTED ON SYSTEM # 2102.
RECORD SEQUENCE NUMBER: 21.
***********************************************
TEXT FROM TGHA:
A NEW MF20 KNOWN ERROR HAS BEEN DECLARED. DATA:
STORAGE MODULE SERIAL NUMBER: 8320021
BLOCK: 3, SUBBLOCK: 1, BIT IN FIELD (10): 5,
ROW: 174, COLUMN: 52, E NUMBER: 109, ERROR TYPE: CELL
SHORT
SEQ TIME Mon 30 Jun 80
21. 10:02:41 MF20 REPORT
5.3.14 KLERR Front End Device Report
The following entry is written into the system event file when the KL
clock stops for any of several errors (FAST MEMORY, PARITY ERRORS,
CRAM PARITY ERROR, DRAM PARITY ERROR, or FIELD SERVICE STOP). Any
significant error signal will be listed just after the header.
5-50
ENTRY DESCRIPTIONS
5-51
ENTRY DESCRIPTIONS
5-52
ENTRY DESCRIPTIONS
|
|
|
| 5.4 DECNET ENTRIES (V2.1)
|
The following sections list both the FULL and SHORT versions of
| network entries (Version 2.1) TOPS-10 or TOPS-20 can record in the
system event file.
5.4.1 Network Control Started
Whenever NETCON is loaded and started, the monitor records a Network
Control Started entry into the system event file. This entry includes
the version number and the node on which NETCON is running.
FULL
***********************************************
NETWORK CONTROL STARTED
LOGGED ON Mon 23 Jun 80 11:37:08 MONITOR UPTIME WAS 2 DAYS 11:21:41
DETECTED ON SYSTEM # 2137.
RECORD SEQUENCE NUMBER: 15.
***********************************************
PROGRAM NAME: NETCON
PROGRAM VERSION: 4(22)
NODE NAME: KL2137
SHORT
SEQ TIME Mon 23 Jun 80
15. 11:37:08 NCU STARTED PROGRAM: NETCON VER:4(22)
STARTED ON NODE KL2137
5-53
ENTRY DESCRIPTIONS
5.4.2 Network Up-Line Dump
Whenever NETCON dumps a node, the monitor records the name of the node
involved, the line used, the dump-file specification, and any return
code as a Network Up-Line Dump entry in the system event file.
FULL
***********************************************
NETWORK UP-LINE DUMP
LOGGED ON Mon 23 Jun 80 11:07:53 MONITOR UPTIME WAS 2 DAYS 10:52:26
DETECTED ON SYSTEM # 2137.
RECORD SEQUENCE NUMBER: 11.
***********************************************
TARGET NODE NAME: DN20L
SERVER NODE NAME: KL2137
SERVER LINE DESIG.: DTE20_1_0
FILE NAME DUMPED: PS:<SROBINSON>DN20L-R4-26.DMP
SHORT
SEQ TIME Mon 23 Jun 80
11. 11:07:53 UP-LINE DUMP OF NODE DN20L BY NODE KL2137
LINE DESIGNATION DTE20_1_0
FILE DUMPED TO PS:<SROBINSON>DN20L-R4-26.DMP
5.4.3 Network Down-Line Load
Whenever NETCON loads a node, the monitor records the name of the node
involved, the line used, the load-file specification, and any return
code as a Network Down-Line Load entry in the system event file.
FULL
***********************************************
NETWORK DOWN-LINE LOAD
LOGGED ON Mon 23 Jun 80 11:10:33 MONITOR UPTIME WAS 2 DAYS 10:55:06
DETECTED ON SYSTEM # 2137.
RECORD SEQUENCE NUMBER: 13.
***********************************************
TARGET NODE NAME: DN20L
SERVER NODE NAME: KL2137
SERVER LINE DESIG.: DTE20_1_0
FILE NAME LOADED: PS:<NEXT-RELEASE>DN20L-R4-26.SYS.1
SHORT
SEQ TIME Mon 23 Jun 80
13. 11:10:33 DOWN-LINE LOAD OF NODE DN20L BY NODE KL2137
LINE DESIGNATION DTE20_1_0
FILE LOADED PS:<NEXT-RELEASE>DN20L-R4-26.SYS.1
5-54
ENTRY DESCRIPTIONS
5.4.4 Network Hardware Error
Whenever NETCON detects an error in any hardware device connected to a
node, the monitor records this information as a Network Hardware Error
in the system event file.
5-55
ENTRY DESCRIPTIONS
5.4.5 Network CHECK11 Report
Whenever the DN20 or DN200 is loaded, CHECK11 (a hardware test module)
is started. All messages from CHECK11, at that time, become one entry
in the system event file.
Note that the log data in this entry is an ASCIZ CHECK11 message of
arbitrary length.
FULL
***********************************************
NETWORK CHECK11 REPORT
LOGGED ON Mon 23 Jun 80 11:09:56 MONITOR UPTIME WAS 2 DAYS 10:54:28
DETECTED ON SYSTEM # 2137.
RECORD SEQUENCE NUMBER: 12.
***********************************************
MSG SENT FROM: KL2137
MSG REC'D AT: KL2137
HDWR TYPE: UNKN SOFTWARE TYPE: UNKN
PARENT SYSTEM TYPE: UNKN
MSG SEQUENCE # FROM XMIT NODE: 2.
TEXT FROM CHK11 REPORT:
CHK11 HARDWARE TEST
version 2A(21) of 10-AUG-79 by LDW
Testing begins...
THE PROCESSOR SEEMS TO BE A KD11-E (11/34)
CHK11 EXPECTED AN 11/34
KT11 memory management test
PHYSICAL MEMORY HAS ABSOLUTE LIMITS OF
0 - 757777
FOR A TOTAL OF 124KW (DECIMAL)
MAPPED PHYSICAL MEMORY TEST...
...COMPLETE
KW11-L checked
device scan report assumes
DN20
DN21
DN25 fixed assignments (no floating)
1 Fixed DTE20 at 174440, vector at 774
1 Fixed KMC11 at 160540, vector at 540
2 Fixed DUP11s from 160300, vector at 570
2 Fixed DMC11s from 160740, vector at 670
CHK11 complete
SHORT
SEQ TIME Mon 23 Jun 80
12. 11:09:56 NETWORK CHECK11 REPORT
5-56
ENTRY DESCRIPTIONS
5.4.6 Network Line Statistics
Periodically, NETCON records the status of each communications line,
and this information becomes an entry in the system event file.
FULL
***********************************************
NETWORK LINE STATISTICS
LOGGED ON Mon 16 Jun 80 08:34:19 MONITOR UPTIME WAS 0:34:48
DETECTED ON SYSTEM # 2137.
RECORD SEQUENCE NUMBER: 1.
***********************************************
MSG SENT FROM: DN20L
MSG REC'D AT: KL2137
HDWR TYPE: DTE-20 SOFTWARE TYPE: UNKN
PARENT SYSTEM TYPE: UNKN
LINE ID: DTE_1_0_0
REASON FOR ENTRY: PERIODIC ENTRY
1802. SECONDS SINCE LAST ZEROED
808. BLOCKS RECEIVED
814. BLOCKS SENT
0. NON - LINE ERROR RETRANSMISSIONS
SHORT
SEQ TIME Mon 16 Jun 80
1. 08:34:19 NETWORK LINE COUNTERS FROM NODE DN20L FOR LINE DTE_1_0_0
LINE ERROR RETRANS RECV LINE ERRORS
5-57
ENTRY DESCRIPTIONS
| 5.5 DECNET ENTRIES (V3.0)
|
| The DECnet V3.0 module Event Logger records any significant network
events into the system event file. The headers for DECnet V3.0
entries have the title:
PHASE III DECNET ENTRY
The body of each entry contains numbers that correspond to specific
event classes and event types. Tables 5-1 and 5-2 list the meaning of
the numbers in the entry. Refer to Section 4.3.3 for information on
how to RETRIEVE network entries by event class.
|
|
| Table 5-1: Network Event Classes
|
|
Event Class Description
0 Network Management Layer
1 Applications Layer
2 Session Control Layer
3 Network Services Layer
4 Transport Layer
5 Data Link Layer
6 Physical Link Layer
7-31 Reserved for other common event classes
32-63 Reserved for RSTS specific event classes
64-95 Reserved for RSX specific event classes
96-127 Reserved for TOPS-20 specific event
classes
128-159 Reserved for VMS specific event classes
160-191 Reserved for RT specific event classes
192-479 Reserved for future use
480-511 Reserved for Customer specific event
classes
5-58
ENTRY DESCRIPTIONS
| Table 5-2: Network Events
|
|
Class Type Entity Event Text
0 0 none Event records lost
0 1 node Automatic node counters
0 2 line,circuit Automatic data link
counters
0 3 line,circuit Automatic data link
service
0 4 line,circuit Data link counters zeroed
0 5 node Node counters zeroed
0 6 line,circuit Passive loopback
0 7 line,circuit Aborted service request
2 0 none Local node state change
2 1 none Access control reject
3 0 none Invalid message
3 1 none Invalid flow control
3 2 node Data base reused
4 0 none Aged packet loss
4 1 circuit Node unreachable packet
loss
4 2 circuit Node out-of-range packet
loss
4 3 circuit Oversized packet loss
4 4 circuit Packet format error
4 5 circuit Partial routing update
loss
4 6 circuit Verification reject
4 7 circuit Circuit down, circuit
fault
4 8 circuit Circuit down, software
fault
4 9 circuit Circuit down, operator
fault
4 10 circuit Circuit up
4 11 circuit Initialization failure,
circuit fault
4 12 circuit Initialization failure,
software fault
4 13 circuit Initialization failure,
operator fault
4 14 node Node reachability change
5 0 line,circuit Locally initiated state
change
5 1 line,circuit Remotely initiated state
change
5 2 line,circuit Protocol restart received
in
maintenance mode
5 3 line,circuit Send error threshold
5 4 line,circuit Receive error threshold
5 5 line,circuit Select error threshold
5 6 line,circuit Block header format error
5 7 line,circuit Selection address error
5-59
ENTRY DESCRIPTIONS
5 8 line,circuit Streaming tributary
5 9 line,circuit Local buffer too small
5-60
ENTRY DESCRIPTIONS
| Table 6-2: Network Events (Cont.)
Class Type Entity Event Text
6 0 line Data set ready
transition
6 1 line Ring indicator
transition
6 2 line Unexpected carrier
transition
6 3 line Memory access error
6 4 line Communications
interface error
6 5 line Performance error
| The following are examples of three DECnet Version 3.0 entries
in FULL format:
***********************************************
PHASE III DECNET ENTRY
LOGGED ON 7-Dec 03:01:49 MONITOR UPTIME WAS 0 DAY(S) 9:9:33
DETECTED ON SYSTEM # 2102.
RECORD SEQUENCE NUMBER: 19.
***********************************************
Event type 4.10 Line up
From node 118. (MCB), occurred 7-DEC-1981 0:00:00.400
CIRCUIT = DMC-0
NODE = 121
***********************************************
PHASE III DECNET ENTRY
LOGGED ON 7-Dec 03:01:50 MONITOR UPTIME WAS 0 DAY(S) 9:9:35
DETECTED ON SYSTEM # 2102.
RECORD SEQUENCE NUMBER: 20.
***********************************************
Event type 4.14 Node reachability change
From node 118. (MCB), occurred 7-DEC-1981 0:00:00.466
REMOTE NODE = 103 ()
STATUS = REACHABLE
***********************************************
PHASE III DECNET ENTRY
LOGGED ON 7-Dec 03:02:02 MONITOR UPTIME WAS 0 DAY(S) 9:9:47
DETECTED ON SYSTEM # 2102.
RECORD SEQUENCE NUMBER: 21.
***********************************************
Event type 5.3 Send error threshold
From node 118. (MCB), occurred 7-DEC-1981 0:00:18.000
CIRCUIT = KDP-0-0
5-61
ENTRY DESCRIPTIONS
| The following are examples of the same three DECnet Version 3.0
entries above but these are listed in SHORT format:
19. 03:01:49 DECNET Event type 4.10 Line up
From node 118. (MCB)
occurred 7-DEC-1981 0:00:00.400
20. 03:01:50 DECNET Event type 4.14 Node reachability change
From node 118. (MCB)
occurred 7-DEC-1981 0:00:00.466
21. 03:02:02 DECNET Event type 5.3 Send error threshold
From node 118. (MCB)
occurred 7-DEC-1981 0:00:18.000
| The following DECnet entry lists packet header information:
|
| ***********************************************
| PHASE III DECNET ENTRY
| LOGGED ON 27-Feb-84 07:23:29-EST MONITOR UPTIME WAS 1 DAY(S) 0:2:17
| DETECTED ON SYSTEM # 2871.
| RECORD SEQUENCE NUMBER: 120.
| ***********************************************
|
| Event type 4.1 Node unreachable packet loss
| From node 143. (GIDDN), uptime was 1 day(s) 16:56:39
|
| Packet Header = 2 / 142 / 143 / 6
|
| From left to right, the four fields listed with the packet header mean
| the following:
|
| Field one (2) - is a hexidecimal value one byte long
| representing the message flags.
|
| Field two (142) - is a decimal (unsigned) value two bytes long
| representing the destination node address.
|
| Field three (143) - is a decimal (unsigned) value two bytes long
| representing the source node address.
|
| Field four (6) - is a hexidecimal value one byte long
| representing the forwarding data.
|
| Note if the packet is a control packet, the packet header will contain
| only two fields, the message flags (Field one) and the source node
| address (Field three).
|
| For more information on network event parameters, see Appendix F.
|
| For more information concerning DECnet Version 3.0 entries, refer to
| the DECnet documentation for system managers and operators.
5-62
APPENDIX A
SPEAR MESSAGES
There are four general categories of SPEAR messages; User Validation
Messages, Dialogue Usage Messages, Warning Messages, and Event File
Messages. The following tables list these messages and suggested
actions.
Table A-1: User Validation Messages
The following messages can occur because of an error upon the
user's part. Each message is preceded by the header:
| ?USER Validation failed
|
|
| CODE or SEQUENCE not allowed in list of responses.
|
| You have selected CODE or SEQUENCE as a response and have
| attempted to add another selection type.
Does not match any valid response
Typed a response that did not match one of the list of valid
responses.
End time must be later than begin time
Typed an ending date/time that is prior to or the same as the
| beginning date/time in RETRIEVE.
Invalid date format
Typed date incorrectly. The correct format is dd-mmm-yy or
-dd.
Invalid time format
Typed time incorrectly. The correct format is hh:mm:ss.
Matches more than one valid response
Typed a response that was not unique. Need to type more
characters before pressing the RETURN key or ESCAPE key.
A-1
SPEAR MESSAGES
May not select all at this prompt
You tried to select ALL when you must respond with specific
names or numbers.
A-2
SPEAR MESSAGES
No recognition for this prompt
Typed ESCAPE key where it is impossible to fill in the
blanks.
Not a valid name or number
If a name, typed a special character or more than the maximum
number of characters. If a number, typed a special character
or alphabetic character or more than the maximum number of
digits.
That function is not available
| You typed a function name that does not exist in the same
| directory as SPEAR.
Table A-2: Dialogue Usage Messages
The following messages can occur when you are responding to the
dialogue incorrectly. They are meant to give you some insight as
to what the correct response is to the current prompt.
Not one of the recognized types
At RETRIEVE level, when specifying a device, you typed a ?
after typing a few characters. SPEAR did not recognize the
device as one of its physical devices.
Please select function first
Typed a switch that requires some function to have been
selected first (for example, /GO or /SHOW) at the SPEAR>
prompt.
Unable to complete this response
You typed an ESCAPE to a prompt that SPEAR does not know how
to complete. This is true whenever the response is not one
of a fixed list of possible responses, for example, time of
day or file specification.
A-3
SPEAR MESSAGES
Table A-2: Dialogue Usage Messages (Cont.)
No default response for this prompt
Typed the ESCAPE key or another delimiter where there is no
default (at SPEAR> prompt, for example).
Table A-3: Warning Messages
The following is a list of warning messages you may receive during
a SPEAR operation. Each message is introduced with the following
sentence:
-- The following should be noted before proceeding --
Impossible to input event records from the terminal!
You specified TTY: in response to a request for a file
specification.
The input file will be superseded!
In RETRIEVE, you named the output file the same name as the
input file. This means you will overwrite your input file if
you proceed.
Will overwrite input file with ASCII output!
In RETRIEVE, you specified the same name for both input and
output files and also specified ASCII as the output format.
If you proceed, the input file (which is binary) will be
overwritten with ASCII output.
Binary output to terminal is unreadable!
In RETRIEVE, you requested the BINARY report format and then
specified TTY: in response to Output to:
Merging with self causes duplicate records!
In RETRIEVE, you specified the same name for both the input
file and the merge file. If you proceed, you will end up
with a file containing duplicate records.
Will create an exact copy of the input file!
In RETRIEVE, you selected all the events in the system event
file and then requested them in BINARY format. This is a
waste of effort because all you will have succeeded in doing
is duplicating the system event file.
A-4
SPEAR MESSAGES
Table A-3: Warning Messages (Cont.)
Will create an empty output file!
In RETRIEVE, you have excluded everything during the
selection process.
This function can cause SEVERE system degradation!
You have turned on the KLSTAT switch which slows down system
operation to gather extra data into the system event file.
Table A-4: Event File Messages
The following messages can occur as the result of an error in the
system event file. The message indicates a recoverable error.
Each message is preceded with the following header:
%SPEAR Event file error detected in module ____routine ____
Bad header found - RESYNCHing
Lost synchronization in file, resynchronizing in next file
block. Some data has been lost.
EOF encountered while skipping an entry
Error file is truncated for some reason. Some data has been
lost.
Internal EOF found - RESYNCHing
Internal end-of-file mark detected but still has data. (This
can happen if files are appended to each other.) No data is
lost.
Premature EOF detected in error file!
Encountered an EOF in the middle of a header or entry. File
is truncated. Some data is lost.
You can also receive fatal error messages in the form:
?SPEAR Program error in module ____routine ____
where the blanks are filled in with the module and routine names.
These are SPEAR program errors over which you have no control. If you
receive such an error, fill out a Software Performance Report
describing the error and the situation leading up to the error.
Another error over which you have no control is an error from an
internal program called XPORT. XPORT does not identify itself in the
A-5
SPEAR MESSAGES
message. However, the message is preceded by a question mark,
indicating, in this case, that this is a fatal error. If you receive
an XPORT error message, you should also fill out a Software
Performance Report.
A-6
SPEAR MESSAGES
Other possible messages you can receive originate from the operating
system. For example:
| ?SPEAR Monitor call failed TOPS-20
?SCNxxx message TOPS-10
| On TOPS-20, you should refer to the Monitor Calls Manual for a list of
these messages. On TOPS-10, you should refer to the SCAN
documentation for a list of SCAN messages.
A-7
|
B-1
|
|
|
|
|
|
|
|
|
|
|
|
| APPENDIX B
|
| INSTALLATION PROCEDURES
|
|
|
| B.1 INTRODUCTION
|
| SPEAR consists of RETRIEVE, SUMMARIZE, and KLSTAT functions.
|
| SPEAR is distributed with the TOPS-10 and TOPS-20 monitor distribution
| tape and has two savesets containing all the files, <DOCUMENTATION>
| and <SUBSYS>.
|
|
|
| B.1.1 SPEAR Files
|
| The documentation files included in <DOCUMENTATION> for SPEAR are:
|
| o SPEAR.DOC - SPEAR installation document
|
| o DEFINE.LIS - Event file documentation
|
| The files included in <SUBSYS> for SPEAR are:
|
| o SPEAR.SPE - Help file used during user interface
|
| o SPEAR.EXE - User interface and main control routines
|
| o RFB.EYE - Internal definitions for RETRIEVE package
|
| o MSGARG.SPT - Binary file for RETRIEVE package
|
| o RETRFB.SPE - Text file for RETRIEVE package
|
| o SPRRET.SPE - Text file for RETRIEVE package
|
| o SPRRET.EXE - Error file manipulation and translation package
| for RETRIEVE
|
| o SPRSUM.SPE - Text file for SUMMARIZE package
|
| o SPRSUM.EXE - Device summarization package for SUMMARIZE
|
|
|
| B.1.2 Loading and Installing SPEAR
|
| Both the documentation saveset <DOCUMENTATION> and the SPEAR
| executable saveset <SUBSYS> are located on the <SUBSYS> area of the
| monitor distribution tape. Therefore, you need not worry about
B-1
| INSTALLATION PROCEDURES
| installing SPEAR separately; it is part of the monitor installation
| package.
|
| All the files listed in Section B.1.1 must reside in the same
| directory for SPEAR to operate properly.
B-2
|
|
|
|
|
|
|
|
|
|
|
|
| APPENDIX C
|
| COMMAND AND CONTROL FILES
|
|
|
| Because of dialogue changes in RETRIEVE and SUMMARIZE, if you have
| existing SPEAR V1.0 command or control files, you must change them for
| SPEAR V2.0 or they will not run.
|
| For RETRIEVE, the changes from V1.0 to V2.0 are in the Selection type,
| Error and Nonerror fields. No changes are necessary if your command
| or control file specified a Selection type of Error, All. See Section
| 4.3.3 for the RETRIEVE dialogue changes.
|
| You can maintain the same functionality for an error selection by
| changing the V1.0 dialogue to the following V2.0 dialogue:
|
| SPEAR V1.0 SPEAR V2.0
|
| @SPEAR @SPEAR
| *RETRIEVE *RETRIEVE
| *SERR:ERROR.SYS *SERR:ERROR.SYS
| *INCLUDED *INCLUDED
| *ERROR *ERROR
| *DISK *DISK
| *RP06 *RP06
| *FINISHED *ALL (Here's the difference.)
| *EARLIEST *FINISHED
| *LATEST *EARLIEST
| *DSK:RETRIE.RPT *LATEST
| */GO *DSK:RETRIE.RPT
|
| To RETRIEVE the events for a specific device error type, replace the
| ALL in the previous V2.0 control file with one or more device error
| types, for example, Software, Bus, Channel-controller.
|
| For Nonerror selection, you can now select specific devices. Instead
| of Nonerror, specify Statistics, Configuration, Diagnostics, Other, or
| a combination of these, separated by commas.
|
| SPEAR V1.0 SPEAR V2.0
|
| @SPEAR @SPEAR
| *RETRIEVE *RETRIEVE
| *SERR:ERROR.SYS *SERR:ERROR.SYS
| *INCLUDED *INCLUDED
| *NONERROR *STATISTICS,DIAGNOSTICS (Change)
| *EARLIEST *DISK (Change)
| *LATEST *RA60,RA80,RA81 (Change)
| *DSK:RETRIE.RPT *FINISHED (Change)
| */GO *EARLIEST
C-1
| COMMAND AND CONTROL FILES
| *LATEST
| *DSK:RETRIE.RPT
| */GO
|
| For SUMMARIZE, two new prompts have been added to the dialogue,
| Category and Show Error Distribution. You can maintain the same
| functionality by changing the V1.0 dialogue to the following V2.0
| dialogue:
|
| SPEAR V1.0 SPEAR V2.0
|
| @SPEAR @SPEAR
| *SUMMARIZE *SUMMARIZE
| *SERR:ERROR.SYS *SERR:ERROR.SYS
| *EARLIEST *ALL (Change)
| *LATEST *EARLIEST
| *DSK:SUMMAR.RPT *LATEST
| */GO *YES (Change)
| *DSK:SUMMAR.RPT
| */GO
|
| To get summaries for a specific device or class of devices, replace
| ALL in the previous V2.0 dialogue with device selection. For example:
|
| SPEAR V2.0
|
| @SPEAR
| *SUMMARIZE
| *SERR:ERROR.SYS
| *DISK
| *RA60,RA80
| *FINISHED
| *EARLIEST
| *LATEST
| *YES
| *DSK:SUMMAR.RPT
| */GO
|
| To suppress the error distribution charts, change the YES to NO in the
| dialogue.
C-2
|
|
|
|
|
|
|
|
|
|
|
|
| APPENDIX D
|
| EVENT CODES
|
|
|
| The following table contains the current list of TOPS-10 and TOPS-20
| event codes along with their internal class. The dashes (---)
| indicate that the event code does not exist under the specified
| operating system.
|
|
| Table D-1: TOPS-10 and TOPS-20 Event Codes
|
|
| -10 Name -20 Internal Subsystem
| Code Code Class
|
| 001 SYSTEMRELOAD 101 ERROR MONITOR
| 002 MONITORBUGDATA 102 ERROR MONITOR
| 005 EXTRACTEDCRASHINFO --- ERROR MONITOR
| 006 CHANNELERRORREPORT --- ERROR MAINFRAME
| 007 DAEMONSTARTED --- CONFIG SOFTWARE
| 010 OLD DISK ERROR --- ERROR DISK
| 011 MASSBUSERR 111 ERROR DISK/TAPE
| 012 DX20ERR --- ERROR DISK/TAPE
| 014 SOFTWAREEVENT --- ERROR SOFTWARE
| --- STATISTICS 114 STATISTICS DISK/TAPE
| 015 CONFIGCHANGE 115 CONFIG (ALL)
| 016 SYSERRORLOG 116 ERROR SOFTWARE
| 017 SOFTWAREREQDATA --- ERROR SOFTWARE
| 021 TAPEERR --- ERROR TAPE
| 030 FEDEVICE-ERR 130 ERROR/CONFIG MAIN/UNIT/COMM
| 031 FERELOAD 131 CONFIG MAINFRAME
| 033 KSHALTSTATUS 133 ERROR MAINFRAME
| 040 OLDDISKSTATS --- STATISTICS DISK
| 042 TAPESTATS --- STATISTICS TAPE
| 045 DISKSTATS --- STATISTICS DISK
| 050 DLHARDWAREERROR --- ERROR COMM
| 052 KLPARNXMINT --- ERROR MAINFRAME
| 054 KSNXMTRAP --- ERROR MAINFRAME
| 055 KLORKSPARTRAP --- ERROR MAINFRAME
| 056 NXMMEMORYSWEEP --- ERROR MAINFRAME
| 057 PARMEMORYSWEEP --- ERROR MAINFRAME
| 061 CPUPARTRAP 160 ERROR MAINFRAME
| 062 CPUPARINT 162 ERROR MAINFRAME
| 063 KLCPUSTATUS 163 ERROR CRASH
| 064 DEVICESTATUS --- ERROR CRASH
| --- MF20ERR 164 ERROR MAINFRAME
| 066 OLDKLADDRESSFAIL --- ERROR MAINFRAME
| 067 KLADDRESSFAIL --- ERROR MAINFRAME
| 071 LP100ERR --- ERROR UNITRECORD
D-1
| EVENT CODES
| 072 HARDCOPYERR --- ERROR UNITRECORD
| 201 NETCONSTARTED 201 CONFIG NETWORK
| 202 NODEDOWNLINELOAD 202 CONFIG NETWORK
| 203 NODEDOWNLINEDUMP 203 CONFIG NETWORK
| 210 NETHARDWAREERR 210 ERROR NETWORK
| 211 NETSOFTWAREERR 211 ERROR NETWORK
| 220 NETOPRLOGENTRY 220 ERROR NETWORK
| 221 NNETTOPOLOGYCHANGE 221 CONFIG NETWORK
| 222 NETCHECK11REPORT 222 CONFIG NETWORK
| 230 NETLINESTATS 230 STATISTICS NETWORK
| 231 NETNODESTATS 231 STATISTICS NETWORK
| 232 OLDDN64STATS 232 STATISTICS NETWORK
| 233 DN6XSTATS 233 STATISTICS NETWORK
| 234 DN6XENABLEDISABLE 234 CONFIG NETWORK
| 240 PHASE III DECNET 240 ERROR NETWORK
| 242 HSC50 END PACKET 242 ERROR DISK/TAPE
| 243 HSC50 ERROR LOG 243 ERROR DISK/TAPE
| 244 KLIPA EVENT 244 ERROR CI
| 245 MSCP ERROR 245 ERROR CI
| 250 DIAGNOSTIC EVENT 250 DIAGNOSTIC (ALL)
D-2
|
|
|
|
|
|
|
|
|
|
|
|
| APPENDIX E
|
| DISK SUBSYSTEM ERROR BITS
|
|
|
| The following charts list the categories into which the error bits
| fall in the SUMMARIZE report for Disk Subsystems.
|
| For example, if the SUMMARIZE report states that your RP06 has 6 SK-SR
| (SEEK-SEARCH) errors, you may want to know what specific RP06 error
| bits are considered to be in this category. If you go to the SK-SR
| chart and look under device for RP04,5,6 (which means either RP04,
| RP05, or RP06), you will see that this chart shows that any one of the
| 3 error bits listed is considered as a SEEK-SEARCH error.
|
| The headings have the following meanings:
|
| ERROR NAME The name listed in the KL10 Maintenance Guide
|
| DEVICE The device type
|
| REG The register containing the error bit
|
| BIT The position of the error bit
|
| COMMENTS Any qualifiers if applicable
|
| The following is a list of the charts that will follow:
|
| TIMIN = TIMING
| SK-SR = SEEK-SEARCH
| READ = READ-WRITE
| CH-CO = CHANNEL-CONTROLLER
| BUS = BUS
| SOFT = HARDWARE DETECTED SOFTWARE ERROR
| MICRO = MICROPROCESSOR DETECTED ERROR
| UNSAF = UNSAFE
| WRTLK = WRITE LOCK
| OFFLI = OFFLINE
|
|
|
| *-*-*-*-*-*-*-*-*-*-*
| * *
| * TIMIN *
| * *
| *-*-*-*-*-*-*-*-*-*-*
|
| ERROR NAME DEVICE REG BIT Comments
| __________________________________________________
|
E-1
| DISK SUBSYSTEM ERROR BITS
| OP INC RP04,5,6 ERR 1 13
| DRIVE TIMING ERR RP04,5,6 ERR 1 12
| INDEX ERROR RP04,5,6 ERR 2 11
|
| INDEX UNSAFE RP07 ERR 3 06
| DRIVE TIMING ERR RP07 ERR 1 12
| OP INC RP07 ERR 1 13
|
| OP INC RM03,5 ERR 1 13
|
| OP INC RK07 RKER 13
| DRIVE TIMING ERR RK07 RKER 12
|
| E0 RL02 RLCS See note after last chart
| E3 RL02 RLCS See note after last chart
|
|
|
|
| *-*-*-*-*-*-*-*-*-*-*
| * *
| * SK-SR *
| * *
| *-*-*-*-*-*-*-*-*-*-*
|
|
|
|
| ERROR NAME DEVICE REG BIT Comments
| __________________________________________________
|
| SEEK INC RP04,5,6 ERR 3 14
| OFF CYL RP04,5,6 ERR 3 15
| HEADER COMP ERR RP04,5,6 ERR 1 07
|
| SEEK INC RP07 ERR 3 14
| LOSS CYL ERROR RP07 ERR 3 09
| HEADER COMP ERR RP07 ERR 1 07
|
| HEADER COMP ERR RM03,5 ERR 1 07
| SEEK INC RM03,5 ERR 2 14
|
| SEEK INCOMPLETE RK07 RKER 01
| DRIVE OFF TRACK RK07 RKDS 05
| HEADER VERTICALRC RK07 RKER 08
|
| SEEK TIME OUT RL02 RLMP 12
| E1 RL02 RLCS See note after last chart
|
|
|
|
|
|
|
|
| *-*-*-*-*-*-*-*-*-*-*
| * *
| * READ *
| * *
| *-*-*-*-*-*-*-*-*-*-*
|
|
E-2
| DISK SUBSYSTEM ERROR BITS
|
|
|
| ERROR NAME DEVICE REG BIT Comments
| __________________________________________________
|
| DATA CHECK RP04,5,6 ERR 1 15
| HEADER CRC ERR RP04,5,6 ERR 1 08
| FORMAT ERR RP04,5,6 ERR 1 04
|
| BAD SECTOR ERR RP07 ERR 3 15
| DATA CHECK RP07 ERR 1 15
| HEADER CRC ERR RP07 ERR 1 08
| FORMAT ERR RP07 ERR 1 04
| SYNC BYTE ERROR RP07 ERR 3 02
|
| BAD SECTOR ERR RM03,5 ERR 2 15
| DATA CHECK RM03,5 ERR 1 15
| HEADER CRC ERR RM03,5 ERR 1 08
| FORMAT ERR RM03,5 ERR 1 04
|
| BAD SECTOR ERR RK07 RKER 07
| DATA CHECK RK07 RKER 15
| ECC HARD ERR RK07 RKER 06
| FORMAT ERR RK07 RKER 04
|
| E2 RL02 RLCS See note after last chart
|
|
|
| *-*-*-*-*-*-*-*-*-*-*
| * *
| * CH-CO *
| * *
| *-*-*-*-*-*-*-*-*-*-*
|
|
|
|
| ERROR NAME DEVICE REG BIT Comments
| __________________________________________________
|
| CHAN ERR RH10 CONI 20
| OVER RUN RH10 CONI 22 and no drive errors
|
| CHAN ERR RH20 CONI 22
| OVER RUN RH20 CONI 26 and no drive errors
|
| IS TIMEOUT RH780 MBA SR 01
| RD SUB RH780 MBA SR 02
| INV MAP RH780 MBA SR 04
| MAP PE RH780 MBA SR 05
| DATA LATE RH780 MBA SR 11 and no drive errors
|
| NOM EX MEM RH750 MBA SR 01
| SPE RH750 MBA SR 14
| INV MAP RH750 MBA SR 04
| MAP PE RH750 MBA SR 05
| DATA LATE RH750 MBA SR 11 and no drive errors
|
E-3
| DISK SUBSYSTEM ERROR BITS
| NON EX MEM RK07 RKCS2 11
| DATA LATE RK07 RKCS2 15
| WRITECHECK RK07 RKCS2 14 and Not Data Check
|
| E4 RL02 RLCS See note after last chart
|
|
|
|
| *-*-*-*-*-*-*-*-*-*-*
| * *
| * BUS *
| * *
| *-*-*-*-*-*-*-*-*-*-*
|
|
|
|
| ERROR NAME DEVICE REG BIT Comments
| __________________________________________________
|
| RAE RH10 CONI 29
| MDPE RH10 CONI 18
| PARITY ERR RH10 ER 1 03
|
| RAE RH20 CONI 24
| MDPE RH20 CONI 18 and no Class B device errors
| PARITY ERR RH20 ERR 1 03
|
| MCPE RH780 MBA SR 17
| NON EX DRIVE RH780 MBA SR 18
| MDPE RH780 MBA SR 06
| PARITY ERR RH780 ERR 1 03
|
| MCPE RH750 MBA SR 17
| NON EX DRIVE RH750 MBA SR 18
| MDPE RH750 MBA SR 06
| PARITY ERR RH750 ERR 1 03
|
| PARITY ERR RP07 ERR 1 03
| DATA PARITY ERROR RP07 ERR 3 03
|
| NON EX DRIVE RK07 RKCS2 12
| DR TO CNTRL PE RK07 RKCS1 13
| CNTRL TO DR PE RK07 RKER 03
| CONTROLLER TIMEOUT RK07 RKCS1 11
| MULTIPLE DRIVE SEL RK07 RKCS2 09
| UNIT FIELD ERR RK07 RKCS2 08
|
| DRIVE SEL ERR RL02 RLMP 08
|
|
|
|
| *-*-*-*-*-*-*-*-*-*-*
| * *
| * SOFT *
| * *
| *-*-*-*-*-*-*-*-*-*-*
|
|
E-4
| DISK SUBSYSTEM ERROR BITS
|
|
|
| ERROR NAME DEVICE REG BIT Comments
| __________________________________________________
|
| INVALID ADDR ERR RP04,5,6 ERR 1 10
| ADDR OVERFLOW ERR RP04,5,6 ERR 1 09
| REG MOD RFSD RP04,5,6 ERR 1 02
| ILL REG RP04,5,6 ERR 1 01
| ILL FUNCTION RP04,5,6 ERR 1 00
|
| INVALID ADDR ERR RP07 ERR 1 10
| ADDR OVERFLOW ERR RP07 ERR 1 09
| REG MOD RFSD RP07 ERR 1 02
| ILL REG RP07 ERR 1 01
| ILL FUNCTION RP07 ERR 1 00
| PROG ERR RP07 ERR 2 15
| INVALID ADDR ERR RK07 RKER 10
| PROGRAM ERROR RK07 RKCS2 10
| ADR OVERFLOW ERR RK07 RKER 09
| DRIVE TYPE ERR RK07 RKER 05
| NONEXECUTIBLE FNC RK07 RKER 02
| ILL FUNCTION RK07 RKER 00
|
|
|
|
| *-*-*-*-*-*-*-*-*-*-*
| * *
| * MICRO *
| * *
| *-*-*-*-*-*-*-*-*-*-*
|
|
|
|
|
| ERROR NAME DEVICE REG BIT Comments
| __________________________________________________
|
| CROM PARITY ERR RP07 ERR 2 14
| MP UNSAFE RP07 ERR 2 13
| DEFECT SKIP ERR RP07 ERR 3 13
| CONTROL LGIC FAIL RP07 ERR 3 11
| LOSS OF BIT CLOCK RP07 ERR 3 10
| MP HANDSHAKE RP07 ERR 3 08
| SERDES DATA FAIL RP07 ERR 3 04
| SYNC CLOCK FAIL RP07 ERR 3 01
| RUNTIME OUT RP07 ERR 3 00
| FAULT CODE RPO7 ERR 2 00-07 Any nonzero value
|
| *-*-*-*-*-*-*-*-*-*-*
| * *
| * UNSAF *
| * *
| *-*-*-*-*-*-*-*-*-*-*
|
|
|
E-5
| DISK SUBSYSTEM ERROR BITS
|
|
| ERROR NAME DEVICE REG BIT Comments
| __________________________________________________
|
| AC LOW RP04,5,6 ERR 3 06
| DC LOW RP04,5,6 ERR 3 05
| WR OS RP05,6 ERR 3 01
| DC UN RP05,6 ERR 3 00
| NO H SEL RP04,5,6 ERR 2 10
| MULTI H SEL RP04,5,6 ERR 2 09
| TRAN UNSF RP04,5,6 ERR 2 06
| TRAN DET F RP04,5,6 ERR 2 05
| C_SW_UNSF RP04,5,6 ERR 2 03
| W SEL UNSF RP04,5,6 ERR 2 02
| C SK UNSF RP04,5,6 ERR 2 01
| ACUN RP04 ERR 2 15
| PLO UNS RP04,5,6 ERR 2 13
| 30VU RP04 ERR 2 12
| WRITE UNSF RP04,5,6 ERR 2 08
| WR C UNSF RP04,5,6 ERR 2 00
| UNSAFE RP07 ERR 1 14 REG 2<11-13>RD/WRT1-3,REG3<5>DC UNS
| R/W 3 UNSAFE RP07 ERR 2 12
|
| R/W 2 UNSAFE RP07 ERR 2 11
| R/W 1 UNSAFE RP07 ERR 2 10
| WRITE OVERRUN RP07 ERR 2 09
| WRITE READY UNSAF RP07 ERR 2 08
| WRITE CURENT FAIL RP07 ERR 3 12
| DC UNSAFE RP07 ERR 3 05
| UNSAFE RM03,5 ERR 1 14
| DEVICE CHK RM03,5 ERR 2 07
|
| UNSAFE RK06,7 RKER 14
| SPEED LOSS RK06,7 RKDS 04
| ACLO RK06,7 RKDS 03
| WRITE DATA ERR RL01,2 RLMP 15
| CURRENT HEAD ERR RL01,2 RLMP 14
| SPEN ERR RL01,2 RLMP 11
| WRITE GATE ERR RL01,2 RLMP 10 and Not Write Locked
|
|
|
|
|
|
| *-*-*-*-*-*-*-*-*-*-*
| * *
| * WRTLK *
| * *
| *-*-*-*-*-*-*-*-*-*-*
|
|
|
|
|
| ERROR NAME DEVICE REG BIT Comments
E-6
| DISK SUBSYSTEM ERROR BITS
| __________________________________________________
|
| WRITE LOCK ERR RP04,5,6 ERR 1 11
| WRITE LOCK ERR RP07 ERR 1 11
| WRITE LOCK ERR RM03,5 ERR 1 11
|
| WRITE LOCK ERR RK07 RKER 11
| WRITE LOCK RL02 RLMP 13 and Write Gate Error
|
|
| *-*-*-*-*-*-*-*-*-*-*
| * *
| * OFFLI *
| * *
| *-*-*-*-*-*-*-*-*-*-*
|
|
|
|
| ERROR NAME DEVICE REG BIT Comments
| __________________________________________________
|
| MEDIUM ON LINE RP04,5,6 DS 12 OFFLINE when not true
|
| MEDIUM ON LINE RP07 DS 12 OFFLINE when not true
|
| MEDIUM ON LINE RM03,5 DS 12 OFFLINE when not true
|
|
|
|
| !***** RL02 NOTE ****
| !
| ! NOTE THAT THESE 3 BITS (10,11,& 12) OF THE CS REG ARE GROUPED
| ! TO DETERMINE THE ERROR AS FOLLOWS (x means we don't care the state of the bit)
| ! 12 11 10 RESULT
| ! DLT CRC OPI
| ! 0 0 1 = OPI E0
| ! x 1 1 = HEADER CHECK E1
| ! x 1 0 = DATA CRC IF READ OPERATION E2
| ! WRITE CHECK IS WRITE OPERATION
| ! 1 x 1 = HEADER NOT FOUND E3
| ! 1 x 0 = DATA LATE E4
| !
| !*****
E-7
|
F-1
|
|
|
|
|
|
|
|
|
|
|
|
| APPENDIX F
|
| NETWORK EVENT PARAMETERS
|
|
|
| Network Management Layer Event Parameters - Class 0
|
| Type Keywords
|
| 0 SERVICE
| 0 = LOAD 1 = DUMP
| 1 STATUS
| Return code
| 0 = REQUESTED
| >0 = SUCCESSFUL
| <0 = FAILED
| Error detail (if error)
| Error message (optional)
| 2 OPERATION
| 0 = INITIATED
| 1 = TERMINATED
| 3 REASON
| 0 = Receive timeout
| 1 = Receive error
| 2 = Line state change by higher level
| 3 = Unrecognized request
| 4 = Line open error
|
|
| Session Control Layer Event Parameters - Class 2
|
| Type Keywords
|
| 0 REASON
| 0 = Operator command
| 1 = Normal operation
| 1 OLD STATE
| 0 = ON 2 = SHUT
| 1 = OFF 3 = RESTRICTED
| 2 NEW STATE
| 0 = ON 2 = SHUT
| 1 = OFF 3 = RESTRICTED
| 3 SOURCE NODE
| 4 SOURCE PROCESS
| 5 DESTINATION PROCESS
| 6 USER
| 7 PASSWORD (0 means password set; n
| parameter means not set)
| 8 ACCOUNT
|
F-1
| NETWORK EVENT PARAMETERS
| Network Services Layer Event Parameters - Class 3
|
| Type Keywords
|
| 0 MESSAGE
| Message flags
| Destination link address
| Source link address
| Data
| 1 CURRENT FLOW CONTROL
| 0 = No flow control
| 1 = Segment flow control
| 2 = Message flow control
|
| Routing Layer Event Parameters - Class 4
|
| Type Keywords
|
| 0 PACKET HEADER
| Message flags
| Destination node address
| (not for control packet)
| Source node address
| Forwarding data
| (not for control packet)
| 1 PACKET BEGINNING
| 2 HIGHEST ADDRESS
| 3 NODE
| 4 EXPECTED NODE
| 5 REASON
| 0 = Line synchronization lost
| 1 = Data errors
| 2 = Unexpected packet type
| 3 = Routing update checksum error
| 4 = Adjacent node address change
| 5 = Verification receive timeout
| 6 = Version skew
| 7 = Adjacent node address out of range
| 8 = Adjacent node block size too small
| 9 = Invalid verification seed value
| 10 = Adjacent node listener received timeout
| 11 = Adjacent node listener received invalid
| data
| 6 RECEIVED VERSION
| 7 STATUS
| 0 = REACHABLE 1 = UNREACHABLE
|
| Data Link Layer Event Parameters - Class 5
|
| Type Keywords
|
| 0 OLD STATE
| 0 = HALTED 3 = RUNNING
| 1 = ISTRT 4 = MAINTENANCE
| 2 = ASTRT
| 1 NEW STATE
| 0 = HALTED 3 = RUNNING
| 1 = ISTRT 4 = MAINTENANCE
| 2 = ASTRT
| 2 HEADER
| 3 SELECTED TRIBUTARY
F-2
| NETWORK EVENT PARAMETERS
| 4 PREVIOUS TRIBUTARY
| 5 TRIBUTARY STATUS
| 0 = Streaming
| 1 = Continued send after timeout
| 2 = Continued send after deselect
| 3 = End streaming
| 6 RECEIVED TRIBUTARY
| 7 BLOCK LENGTH
| 8 BUFFER LENGTH
| 9 DTE
| 10 REASON
| 11 (Reserved)
| 12 (Reserved)
| 13 PARAMETER TYPE
| 14 CAUSE
| 15 DIAGNOSTIC
|
| Physical Line Layer Event Parameters - Class 6
|
| Type Keywords
|
| 0 DEVICE REGISTER
| 1 NEW STATE
| 0 = OFF
| 1 = ON
F-3
G-1
APPENDIX G
GLOSSARY
The following is a list of terms explained within the context of this
document.
Term Explanation
Body section The data portion of an entry in the
system event file.
BUGCHK A recoverable error detected by the
TOPS-20 operating system.
BUGHLT A non-recoverable error detected by the
TOPS-20 operating system.
BUGINF A message informing you that a certain
event relating to the TOPS-20 operating
system has occurred.
CTY The system operator's terminal.
Dump format One of the three output forms of the
RETRIEVE procedure.
Entry type The type of entry within a system event
file, for example, a MASSBUS Device
Error, or a Crash Restart Error.
ERROR.SYS The name of the system event file in
both the TOPS-10 and TOPS-20 operating
systems.
Event code The octal code designated to a
particular event in the system event
file.
G-1
GLOSSARY
Term Explanation
FRU An acronym for Field Replaceable Unit.
This is a piece of hardware that the
Field Service engineer can replace on
the spot.
Full format A complete and detailed listing of an
event, in ASCII as translated with
RETRIEVE.
Hard error A non-recoverable error.
Header section The top portion of an entry in the
system event file, after SPEAR formats
it.
MTTR An acronym for Mean Time To Repair. The
average time it takes a Field Service
engineer to isolate and repair a system
malfunction.
NXM error An attempt to address a nonexistent
memory location.
Parity error Indicates that one or more bits have
been picked up or dropped to cause a
nonparity condition.
RETRIE.RPT A file containing entries converted from
binary to ASCII.
RETRIE.SYS A file in binary format containing
entries extracted from the system event
file.
Retry count The number of times an operation is
tried, in addition to the first time.
Sequence number The number given to an entry in the
system event file.
Short format A brief version of an entry in the
system event file, after SPEAR has
translated it.
Snapshot The information gathered by the
operating system immediately after
recovering from a crash.
Soft error A recoverable error.
Stopcode A message containing a 3-letter code
printed at the CTY indicating that a
serious error has occurred in the
operating system's data base.
System event file The file where the operating system
records hardware and software events.
G-2
GLOSSARY
Term Explanation
Sweep After certain events occur, the
operating system checks core looking for
more of the same.
G-3
INDEX
ABS, 4-32 Detecting
ACL, 4-32 error, 3-1
ACU, 4-32 Device status block, 5-28
AOE, 4-32 Device types, 4-6
Dialogue
Body section, 2-5, G-1 SPEAR, 4-2
/BREAK switch, 4-4 Dialogue usage messages, A-3
BUGCHK, 2-2, 5-30, G-1 DIS, 4-32
BUGHLT, 2-2, 5-30, G-1 Disk statistics, 5-20
BUGINF, 2-2, G-1 DL10 communications error, 5-22
DN, 5-14, 5-43
Channel failures, 2-3 DPA, 4-33
Checking DTE, 4-32, 4-33
error, 3-1 Dump format, G-1
loop, 3-4 DX20 device error, 5-9, 5-39
range, 3-4
software error, 3-4 ECH, 4-32
sum, 3-4 Entries
validity, 3-4 hardware, 2-2
Checksum, 3-4 performance, 2-4
CM, 5-14, 5-43 software, 2-2
Command TOPS-10, 5-2
HELP, 4-3 TOPS-20, 5-30
Command and Control Files, C-1 Entry descriptions, 5-1
Completing next field, 4-4 Entry type, G-1
Conclusive statement, G-1 Error bits, E-1
CONFIG program, 5-14 Error checking, 3-1
Configuration status change, 5-14, Error detecting, 3-1
5-43 Error detectors
Controller failures, 2-3 hardware, 3-1
Conventions parity, 3-4
record, 2-6 threshold, 3-4
COR/CRC, 4-33 timing, 3-4
CPU failures, 2-3 Error register codes, 4-32
CPU status block, 5-26 ERROR.SYS, G-1
Crash extract, 5-4 Event Codes, D-1
CS/ITM, 4-33 Event codes, G-1
CSF, 4-32 Event file, 4-10
CSU, 4-32 Event file messages, A-5
CTRL/F, 4-4 Executing SPEAR, 4-4
CTRL/U, 4-4 Exiting from SPEAR, 4-5
CTRL/W, 4-4 Extra error reporting, 4-41
CTY, G-1
Failures
DAEMON started, 5-7 channel, 2-3
Data channel error, 5-7 controller, 2-3
DCK, 4-32 CPU, 2-3
DCL, 4-32 I/O device, 2-3
DCU, 4-32 intermittent, 3-1
Deleting current line, 4-4 memory, 2-3
Deleting previous field, 4-4 solid, 3-1
Index-1
types of, 3-1 IXE, 4-32
FCE, 4-33
Features KL CPU status block, 5-49
HELP, 4-3 KL10 parity interrupt, 5-22
FEN, 4-32 KL10 parity trap, 5-24
FER, 4-32 KLERR entry, 4-9
Field KLERR front end report, 5-53
completing next, 4-4 KLSTAT function, 4-1
deleting previous, 4-4 KLSTAT mode, 5-49
File specifications, 4-4 KLSTAT procedure, 4-41
Files KLSTAT switch, 5-49
indirect, 4-2 KS10 Halt status block, 5-19
FMT, 4-33 KS10 NXM trap, 5-23
Format
full, 4-23 Library
octal, 4-19 SPEAR, 4-1
record, 2-5 Line printer error, 5-29
short, 4-18 Loop checking, 3-4
Front end reload, 5-18
Front end reloaded, 5-45 Magtape statistics, 5-19
Front-end device report, 5-18, Magtape system error, 5-16
5-44 MASSBUS device error, 5-33
FRU, G-2 MASSBUS disk error, 5-8
Full format, 4-23, G-2 MASSBUS disk registers, 4-32
Function Memory failures, 2-3
KLSTAT, 4-1 Memory sweep for NXM, 5-25
RETRIEVE, 4-1, 4-5 Memory sweep for parity, 5-26
SUMMARIZE, 4-1, 4-24 MF20 device report, 5-50
MHS, 4-32
Glossary, G-1 Minimum analysis, G-2
/GO switch, 4-4 MSCP, G-2
MSE, 4-32
Hard error, G-2 MTTR, G-2
Hardware entries, 2-2
Hardware error detectors, 3-1 NEF, 4-33
HCE, 4-32 NETCON, 5-53
HCRC, 4-32 Network CHECK11 report, 5-56
Header Network control started, 5-53
sample, 2-5 Network down-line load, 5-54
Header section, 2-5, G-2 Network entries, 5-53
HELP command, 4-3 Network event Classes, 4-7
Help features, 4-3 Network event Parameters, F-1
/HELP switch, 4-3, 4-4 Network hardware error, 5-55
Network line statistics, 5-57
I/O device failures, 2-3 Network up-line dump, 5-54
IAE, 4-32 NHS, 4-32
ILF, 4-32, 4-33 Non-reload monitor error, 5-3
ILR, 4-32, 4-33 NSG, 4-33
INC/UPE, 4-33 NXM error, G-2
Indirect files, 4-2
Input Octal format, 4-19
RETRIEVE, 4-5 OCYL, 4-32
Installation procedures, B-1 OPE, 4-32
Intermittent failures, 3-1 OPI, 4-32, 4-33
Isolation techniques, 3-5 OPR, 5-15, 5-44
Index-2
OT, 5-14, 5-43 Software error checking, 3-4
Software event, 5-13
PAR, 4-32, 4-33 Software requested data, 5-15
Parity error, G-2 Solid failures, 3-1
Parity error detectors, 3-4 SPEAR dialogue, 4-2
PEF/LRC, 4-33 SPEAR library, 4-1
Performance entries, 2-4 SPEAR messages, A-1
PLU, 4-32 SPEAR switches, 4-4
PM, 5-14, 5-43 STOPCD, 2-2, G-2
Procedure Stopcodes, 2-2, G-2
installation, B-1 Sum checking, 3-4
KLSTAT, 4-41 SUMMARIZE function, 4-1, 4-24
RETRIEVE, 4-9 SUMMARIZE procedure, 4-34
SUMMARIZE, 4-34 SUMMARIZE report, 4-25
Processor parity interrupt, 5-47 Switch
Processor parity trap, 5-46 /GO, 4-4
PSU, 4-32 /HELP, 4-3, 4-4
question mark, 4-4
Question mark switch (/?), 4-4 /REVERSE, 4-4
/SHOW, 4-5
R&W, 4-32 System event file, 5-1, G-2
Range checking, 3-4 System log entry, 5-15, 5-44
Record conventions, 2-6 System reload, 5-2
Record format, 2-5
Report TDF, 4-32
SUMMARIZE, 4-25 Techniques
RETRIE.RPT, G-2 isolation, 3-5
RETRIE.SYS, G-2 verification, 3-6
RETRIEVE error class, 4-6 Terminators, 4-2
RETRIEVE function, 4-1, 4-5 TGHA, 5-50
RETRIEVE input, 4-5 Threshold error detectors, 3-4
RETRIEVE output, 4-7 Time window, 3-6
RETRIEVE procedure, 4-9 Timing error detectors, 3-4
Retry count, G-2 TOPS-10 entries, 5-2
Returning to previous prompt, 4-4 TOPS-20 entries, 5-30
Returning to SPEAR prompt, 4-4 TOPS-20 system reloaded, 5-30
/REVERSE switch, 4-4 TUF, 4-32
RMR, 4-32, 4-33 Types of failures, 3-1
RP06, 5-9
Running SPEAR, 4-1 Unit record error, 5-30
UNS, 4-32, 4-33
Sample header, 2-5 User validation messages, A-1
Sample RETRIEVE session, 4-18 UWR, 4-32
Sample SUMMARIZE session, 4-39
Section 35V, 4-32
body, 2-5 Validity checking, 3-4
header, 2-5 Verification techniques, 3-6
Separators, 4-2 30VU, 4-32
Sequence number, G-2 VUF, 4-32
Short format, 4-18, G-2
/SHOW switch, 4-5 Warning messages, A-4
SKI, 4-32 WCF, 4-32
Snapshot, G-2 WCU, 4-32
Soft error, G-2 WLE, 4-32
Software entries, 2-2 WOF, 4-32
Index-3
WRU, 4-32 XPORT messages, A-7
WSU, 4-32
Index-4