Trailing-Edge
-
PDP-10 Archives
-
BB-L288A-RM
-
swskit-utilities/read.hlp
There is 1 other file named read.hlp in the archive. Click here to see a list.
ANALYSING DISK FILES WITH I/O ERRORS USEING THE PROGRAM READ.
READ is a program that reads each page of a file and reports
any exception conditions that are detected. A file page can be
located on a section of the disk surface that is imperfect and
therefore some exceptional procedures are required to properly
read the data. Examples of these procedures are: ECC
correction and head-offset correction. The monitor will invoke
either or both of these techniques in an effort to read the
data. However, BOOT will use neither, and this presents a
serious problem. It is possible to write a new monitor file
and to read it with no reported errors by using the TOPS20 file
system and disk drivers. However, the same file may not be
readable by BOOT since one or more of the pages may be located
in a "correctable" location on the disk. READ was developed
specifically to test the viability of new monitor files so that
BOOT would not fail due to any of the standard correctable
exceptions. However it has proven to be useful for analyzing
files that are difficult to read reliably. For a host of known
and unknown reasons, the error correction circuitry does not
always succeed in detecting or analyzing the cause of a disk
read failure. Therefore, a file page that is located on a
"correctable" portion of a disk's surface may occasionally be
unreadable. An even more bizarre condition may arise when some
portion of the disk I/O logic (the DCL, the RH20, the Massbus
driver and cables,etc.) fails intermittently resulting in
trasient errors that may be recoverable or unrecoverable. READ
has evolved to test for all of the conditions so far described
and to provide as much information as possible about the state
of individual disk file pages and the disk I/O subsystem.
Clearly, hardware failures may produce unpredicatble results,
and no program can adequately protect itself from these
failures ( nor, alas, can it analyze the failures sufficiently
so that it will produce a reliable profile of the failure).
READ, however, has proven to be both a reliable analyzer of
"soft" disk failures, as well as a portent of impending channel
and/or device failures. Following is a description of each of
the messages produced by READ and a first-level interpretation
of the provocation: "?Hard error reading ... This error
indicates that a PT (page table), a page (file page within the
PT), or the PTT (page table table for long files) is
unreadable. None of the known error recovery techniques
sufficed and the data contents are probably lost. There is a
possibility that the page can be read on another drive or
another massbus controller (if the failure is pattern-
sensitive). ?Recoverable error reading ... This error
indicates that a PT, a page, or the PTT is located on a
"correctable" portion of the disk's surface. Again, it is
Page 2
possible that moving the pack to another drive or another
massbus controller will eliminate or change the condition.
However, the file in question will most likely not be readable
by BOOT. %Transient error reading ... This error indicates
that a PT, a page, or the PTT was readable without employing
extraordinary recovery techniques, but not on the first read
attempt. READ will attempt to read each page ten times before
resorting to error recovery. At least the first such attempt
resulted in an error, but one of the subsequent attempts
succeeded in reading the data. Such conditions are the most
difficult to analyze. There is a good chance BOOT will fail to
read the data; however, the cause of the problem is not
readily determined. If this error is reported for a number of
pages on the disk, then it is likely that some hardware failure
is imminent and that the disk surface is fine. If the error is
isolated to a very small number of pages, and is reported
consistently for these same pages, then the problem is likely
on the disk surface. If this error occurs randomly on the
disk, and repeated attempts to run READ result in different
collections of failures, then an intermittent hardware problem
is indicated. ?Check sum error reading ... This error
indicates that a PT or the PTT contains bad data. READ cannot
proceed with the file if it is a short file, or if the file is
long and the problem is in the PTT. In the case of a long
file, and one of its PT's has a check sum error, none of the
pages belonging to that PT can be read. Finally, a word on
using READ: READ prompts with: "File name(s) to verify: ".
The user may type a single file specification with or without
wild card characters ("*" or "%"). READ will process all files
indicated reporting any failures it finds. The user must be an
enabled wheel or operator in order to use READ.