Objective of this article is to understand the types, causes and remediation of data integrity issues that may manifest in a VMFS volume.
Types of Corruption
The data integrity issues may arise on two different segments in a volume
Metadata region is a portion of the Lun that comprises of information about the Lun, such as the partition table, lun information, LVM headers, hearbeat information & File descriptors . Typically all the information about the filesystem layout and geometry is stored here.
This typically spans around 1.5 GB of the initial region of the Lun.
Data region is self explanatory, this is where your actual Virtual machines, iso and any other files/data you choose to keep are stored.
At this juncture , it is important to understand VMware *does not* offer data recovery services, this is documented in KB article
Now that we clearly understand the two types of corruption, it is also important to understand some damage can be reversed/re-mediated while most would need to go through recovering from back up or contacting 3rd party services for recovery
The corruption could be silent or could manifest in the following ways
- Resource cluster metadata corruption detectedVolume in Logs “/var/log/vmkernel”
- Corrupt lock detected at offset in Logs “/var/log/vmkernel”
- cpu0:1031)LVM: 2294: Could not open device , vol [<id>, <id>, 1]: No such partition on target
- Couldn’t read volume header from <id>: Address temporarily unmapped
- Datastore gone missing(not mounted), while Device/Lun is still observed in “Storage adapters” tab
Data corruption could be attributed to (but not limited to)
- SAN failures-Disk/RAID failures etc
- Rogue servers accessing VMFS volumes
- Incorrect zoning/configuration issues
- Storage array firmware bugs
At this time the following type of corruptions can be fixed,
- Partition table loss
- Certain types of metadata corruption-heartbeat region corruption, cluster lock corruption
Typically you would need to engage VMware Technical Support to guage the level of damage and assess if damage is reversible, they may request you to run dd against the problematic volume and/or VOMA if you have vSphere 5.1 or later.
Refer below KB articles for more information: