Bus-Tech > Support > Mainframe Appliance for Storage > Technical Library
Knowledge
Base
Mainframe Appliance for Storage

Corrupted Tape Labels Cause MAS Mount Loop on MVS


There is a bug in MVS whereby you can trash the header label set on a tape by canceling a job at exactly the right time while an output tape is being opened. The cause of this bug has nothing to do with the MAS, but if MVS does this to one of the MAS virtual tape volumes, it can cause operational problems that require manual intervention to resolve.

What happens is that one or more of the volume header labels gets written, then the tape gets unloaded without properly finishing the label set and closing the tape properly. When the canceled job is done, the tape contains a valid VOL1, may contain one or more HDRn labels, but has no tapemark(s) ending the header label set.

What Happens Next on a Real Tape

On a real tape, this would cause unpredictable results. When the tape is next accessed, there might be an I/O error, a complaint from the operating system about the label contents, or no error at all. It all depends on the prior contents of the tape and a bit of luck. This is because when you write to a tape, the old data on the tape media remains in place following the newly written data (or tapemark). Anything past the newly written data (or tapemark) is officially "undefined" but could possibly be read. Depending on characteristics of the old and new data, the tape drive and the media, the old data may be physically readable or an I/O error may occur trying to read it.

If the new data lines up precisely with the old data (or to put it more precisely, if the inter-record gaps of the old and new data line up), as in the following example, the Host would be able to read the old HDR1, etc. At best, the Host would recognize that the (old) HDR labels are not what it was expecting and complain about the volume labels. At worst, the Host would detect no error and use the tape as is.

In many cases, the old data will not line up precisely with the new data, leaving fragments of old data following the newly written data, as in the following example:

In this case, when the Host tries to read past the VOL1 record it will get an I/O error, most like a data check. The tape will have to be re-initialized before it can be used again.

What Happens Next on a MAS

The MAS never returns "undefined" old data following newly written data. The MAS always erases all old data from the virtual tape following every a write or write tapemark. Everything past the newly written record is considered "void", or empty tape. On a MAS, the MVS bug would leave the virtual tape volume looking as follows:

When the Host next accesses this tape, it will see the VOL1 label and expect to find the rest of the HDR labels following it. When it tries to read these non-existent labels, it will receive an I/O error, with "Tape Void" sense (sense 08, ERPA code 31). The Host will display the following messages:

IOS000I unit,chp,VOI,02,0E00,,**,volser,jobname
        084000310000...
IEC512I I/O ERROR unit,volser,label,job

The Host will then unload the tape and ask for it again:

IEC502E R unit,volser,label,job
IEC501A M unit,volser,label,,job

The MAS will display messages such as:

MAS423W: Tape E791 read past end of data
MAS419E: Device E791 data read returned error

This cycle repeats rapidly and indefinitely, as the Host requests then rejects the specified volume. This is what the MAS calls a "mount loop". Normally, mount loops are detected by the MAS and it puts a halt to the cycle after a few interations. Unfortunately, up through the current release of MAS (version 3.02), this exact situation evades the MAS mount loop detection. The Host and MAS will continue the mount, I/O error, unmount, and remount cycle forever until some manual intervention occurs. (The mount loop detection under this condition will be fixed in a future release of the MAS. This won't fix the underlying problem of the trashed tape labels, but will prevent the mount loop from running forever if it occurs.)

Once a MAS virtual tape volume is corrupted in this way, the only way to correct is to either delete the file from the MAS tape library, or to re-initialize it from the Host.

You can tell that a virtual tape volume is in this corrupted state by doing an "ls" command on the file:

ls -l BTL202
-rw-------    1    vtape  vtape   86    May 27 12:24 BTL202

A corrupted file will most likely have a size of 86 bytes (VOL1 only, with no tapemark). It's also possible that it could be 172 bytes (VOL1 and HDR1 with no tapemark) or 258 bytes (VOL1, HDR1, and HDR2 with no tapemark).

You can also verify the size and content with the awsdir program rather than with "ls". If you display the file details, you will see that the file display ends with the warning message: "NOTE NO ENDING TM!".

A Recovery Method Whereby You Don't Have to Cancel the Host Job

If you detect that this problem has occurred and the Host and MAS are in a mount loop, you can use the following procedure to correct the problem without having to cancel the Host job:

On the MAS operator's console, enter the command "unready device", where device is the virtual drive having the problem.

If you caught the device while the corrupted volume was loaded, the mount loop will stop, the Host will display "Intervention Required" and the tape volume will still be mounted. If not, enter the "unready" command again until you catch it in this state. Once you have it in the mounted but not ready ("NR") state, enter "Unload device" to unload it.

Open a terminal shell on the MAS, become root, and "rm" the corrupted file from the tape library directory.

Mount the volume in question with the MAS "Mount volser device" command. Because it doesn't exist now, the MAS will reinitialize the tape volume with a fresh set of tape labels. As soon as the volume is mounted, it will automatically come ready and the Host job should resume normally.



Copyright © 2003 by Bus-Tech, Inc.