Google
 

Trailing-Edge - PDP-10 Archives - BB-L288A-RM - swskit-utilities/read.hlp
There is 1 other file named read.hlp in the archive. Click here to see a list.
ANALYSING DISK FILES WITH I/O ERRORS USEING THE PROGRAM READ.

READ is a program that reads each page of a  file  and  reports
any exception conditions that are detected.  A file page can be
located on a section of the disk surface that is imperfect  and
therefore  some exceptional procedures are required to properly
read  the  data.   Examples  of  these  procedures  are:    ECC
correction and head-offset correction.  The monitor will invoke
either or both of these techniques in an  effort  to  read  the
data.   However,  BOOT  will  use  neither, and this presents a
serious problem.  It is possible to write a  new  monitor  file
and to read it with no reported errors by using the TOPS20 file
system and disk drivers.  However, the same  file  may  not  be
readable  by BOOT since one or more of the pages may be located
in a "correctable" location on the disk.   READ  was  developed
specifically to test the viability of new monitor files so that
BOOT would not fail due to  any  of  the  standard  correctable
exceptions.   However  it has proven to be useful for analyzing
files that are difficult to read reliably.  For a host of known
and  unknown  reasons,  the error correction circuitry does not
always succeed in detecting or analyzing the cause  of  a  disk
read  failure.   Therefore,  a  file  page that is located on a
"correctable" portion of a disk's surface may  occasionally  be
unreadable.  An even more bizarre condition may arise when some
portion of the disk I/O logic (the DCL, the RH20,  the  Massbus
driver  and  cables,etc.)  fails  intermittently  resulting  in
trasient errors that may be recoverable or unrecoverable.  READ
has  evolved to test for all of the conditions so far described
and to provide as much information as possible about the  state
of  individual  disk  file  pages  and  the disk I/O subsystem.
Clearly, hardware failures may produce  unpredicatble  results,
and  no  program  can  adequately  protect  itself  from  these
failures ( nor, alas, can it analyze the failures  sufficiently
so  that  it  will  produce a reliable profile of the failure).
READ, however, has proven to be both  a  reliable  analyzer  of
"soft" disk failures, as well as a portent of impending channel
and/or device failures.  Following is a description of each  of
the  messages produced by READ and a first-level interpretation
of the provocation:   "?Hard  error  reading  ...   This  error
indicates  that a PT (page table), a page (file page within the
PT),  or  the  PTT  (page  table  table  for  long  files)   is
unreadable.   None  of  the  known  error  recovery  techniques
sufficed and the data contents are probably lost.  There  is  a
possibility  that  the  page  can  be  read on another drive or
another  massbus  controller  (if  the  failure   is   pattern-
sensitive).    ?Recoverable   error  reading  ...   This  error
indicates that a PT, a  page,  or  the  PTT  is  located  on  a
"correctable"  portion  of  the  disk's  surface.  Again, it is
                                                         Page 2


possible that moving the  pack  to  another  drive  or  another
massbus  controller  will  eliminate  or  change the condition.
However, the file in question will most likely not be  readable
by  BOOT.   %Transient  error reading ...  This error indicates
that a PT, a page, or the PTT was  readable  without  employing
extraordinary  recovery  techniques,  but not on the first read
attempt.  READ will attempt to read each page ten times  before
resorting  to  error recovery.  At least the first such attempt
resulted in an  error,  but  one  of  the  subsequent  attempts
succeeded  in  reading  the data.  Such conditions are the most
difficult to analyze.  There is a good chance BOOT will fail to
read  the  data;   however,  the  cause  of  the problem is not
readily determined.  If this error is reported for a number  of
pages on the disk, then it is likely that some hardware failure
is imminent and that the disk surface is fine.  If the error is
isolated  to  a  very  small  number  of pages, and is reported
consistently for these same pages, then the problem  is  likely
on  the  disk  surface.   If  this error occurs randomly on the
disk, and repeated attempts to run  READ  result  in  different
collections  of failures, then an intermittent hardware problem
is  indicated.   ?Check  sum  error  reading  ...   This  error
indicates  that a PT or the PTT contains bad data.  READ cannot
proceed with the file if it is a short file, or if the file  is
long  and  the  problem  is  in the PTT.  In the case of a long
file, and one of its PT's has a check sum error,  none  of  the
pages  belonging  to  that  PT can be read.  Finally, a word on
using READ:  READ prompts with:  "File name(s) to  verify:   ".
The  user  may type a single file specification with or without
wild card characters ("*" or "%").  READ will process all files
indicated reporting any failures it finds.  The user must be an
enabled wheel or operator in order to use READ.