Corrupted Tape Labels Cause MAS Mount
Loop on MVS
There is a bug in MVS whereby you can trash the header label
set on a tape by canceling a job at exactly the right time while
an output tape is being opened. The cause of this bug has nothing
to do with the MAS, but if MVS does this to one of the MAS virtual
tape volumes, it can cause operational problems that require manual
intervention to resolve.
What happens is that one or more of the volume header labels
gets written, then the tape gets unloaded without properly finishing
the label set and closing the tape properly. When
the canceled job is done, the tape contains a valid VOL1,
may contain one or more HDRn labels, but has no tapemark(s) ending the
header label set.
What Happens Next on a Real Tape
On a real tape, this would cause unpredictable results.
When the tape is next accessed, there might be an I/O error,
a complaint from the operating system about the label
contents, or no error at all. It all depends on the
prior contents of the tape and a bit of luck. This
is because when you write to a tape, the old data on the
tape media remains in place following the newly written
data (or tapemark). Anything past the newly written data
(or tapemark) is officially "undefined" but could possibly
be read. Depending on
characteristics of the old and new data, the tape drive
and the media, the old data may be physically readable or
an I/O error may occur trying to read it.
If the new data lines up precisely with the old data (or to put
it more precisely, if the inter-record gaps of the old and new data
line up), as in
the following example, the Host would be able to read the old
HDR1, etc. At best, the Host would recognize that the (old) HDR
labels are not what it was expecting and complain about the volume
labels. At worst, the Host would detect no error and use
the tape as is.
In many cases, the old data will not line up precisely with
the new data, leaving fragments of old data following the newly
written data, as in the following example:
In this case, when the Host tries to read past the VOL1
record it will get an I/O error, most like a data check.
The tape will have to be re-initialized
before it can be used again.
What Happens Next on a MAS
The MAS never returns "undefined" old data following newly
written data. The MAS always erases all old data from the
virtual tape following every a write or write tapemark.
Everything past the newly written record is considered "void",
or empty tape. On a MAS, the MVS bug would leave
the virtual tape volume looking as follows:
When the Host next accesses this tape, it will see the VOL1
label and expect to find the rest of the HDR labels following it.
When it tries to read these non-existent labels, it will
receive an I/O error, with "Tape Void" sense (sense 08, ERPA
code 31). The Host will display the following messages:
IOS000I unit,chp,VOI,02,0E00,,**,volser,jobname
084000310000...
IEC512I I/O ERROR unit,volser,label,job
The Host will then unload the tape and ask for it again:
IEC502E R unit,volser,label,job
IEC501A M unit,volser,label,,job
The MAS will display messages such as:
MAS423W: Tape E791 read past end of data
MAS419E: Device E791 data read returned error
This cycle repeats rapidly and indefinitely, as the Host requests then
rejects the specified volume. This is what the MAS calls a
"mount loop". Normally, mount loops are detected by the MAS and it
puts a halt to the cycle after a few interations. Unfortunately,
up through the current release of MAS (version 3.02), this exact
situation evades the MAS mount loop detection. The Host and MAS
will continue the mount, I/O error, unmount, and remount cycle
forever until some manual intervention occurs. (The mount loop
detection under this condition will be fixed in a future release
of the MAS. This won't fix the underlying problem of the trashed tape
labels, but will prevent the mount loop from running forever if
it occurs.)
Once a MAS virtual tape volume is corrupted in this way, the only
way to correct is to either delete the file from the MAS tape
library, or to re-initialize it from the Host.
You can tell that a virtual tape volume is in this corrupted
state by doing an "ls" command on the file:
ls -l BTL202
-rw------- 1 vtape vtape 86 May 27 12:24 BTL202
A corrupted file will most likely have a size of 86 bytes
(VOL1 only, with no tapemark). It's also possible that it
could be 172 bytes (VOL1 and HDR1 with no tapemark) or 258 bytes
(VOL1, HDR1, and HDR2 with no tapemark).
You can also verify the size and content with the awsdir
program rather than with "ls". If you display the file
details, you will see that the file display ends with the
warning message: "NOTE NO ENDING TM!".
A Recovery Method Whereby
You Don't Have to Cancel the Host Job
If you detect that this problem has occurred and the Host
and MAS are in a mount loop, you can use the following procedure
to correct the problem without having to cancel the Host job:
On the MAS operator's console, enter the command
"unready device",
where device is the virtual drive having the problem.
If you caught the device while the corrupted volume was
loaded, the mount loop will stop, the Host will display
"Intervention Required" and the tape volume will still be mounted.
If not, enter the "unready" command
again until you catch it in this state. Once you have it in the
mounted but not ready ("NR") state, enter "Unload device"
to unload it.
Open a terminal shell on the MAS, become root, and "rm" the
corrupted file from the tape library directory.
Mount the volume in question with the MAS "Mount volser
device" command. Because it doesn't exist now, the
MAS will reinitialize the tape volume with a fresh set of tape
labels. As soon as the volume is mounted, it will automatically
come ready and the Host job should resume normally.
|
Copyright © 2003 by Bus-Tech, Inc.
|
|