Metadata & Data Integrity issues observed in certain conditions with VMFS volumes

Objective of this article is to understand the types, causes and remediation of data integrity issues that may manifest in a VMFS volume.


Types of Corruption

The data integrity issues may arise on two different segments in a volume

  1. Metadata
  2. Data

Metadata region is a portion of the Lun that comprises of information about the Lun, such as the partition table, lun information, LVM headers, hearbeat information & File descriptors . Typically all the information about the filesystem layout and geometry is stored here.

This typically spans around 1.5 GB of the initial region of the Lun.

Data region is self explanatory, this is where your actual Virtual machines, iso and any other files/data you choose to keep are stored.

At this juncture , it is important to understand VMware *does not* offer data recovery services, this is documented in KB article

http://kb.vmware.com/kb/1015413

Assessment :

Now that we clearly understand the two types of corruption, it is also important to understand some damage can be reversed/re-mediated while most would need to go through recovering from back up or contacting 3rd party services for recovery

The corruption could be silent or could manifest in the following ways

  • Resource cluster metadata corruption detectedVolume in Logs “/var/log/vmkernel”
  • Corrupt lock detected at offset in Logs “/var/log/vmkernel”
  • cpu0:1031)LVM: 2294: Could not open device , vol [<id>, <id>, 1]: No such partition on target
  • Couldn’t read volume header from <id>: Address temporarily unmapped
  • Datastore gone missing(not mounted), while Device/Lun is still observed in “Storage adapters” tab

Causes

Data corruption could be attributed to (but not limited to)

  1. SAN failures-Disk/RAID failures etc
  2. Rogue servers accessing VMFS volumes
  3. Incorrect zoning/configuration issues
  4. Storage array firmware bugs

Remediation

At this time the following type of corruptions can be fixed,

  1. Partition table loss
  2. Certain types of metadata corruption-heartbeat region corruption, cluster lock corruption

Typically you would need to engage VMware Technical Support to guage the level of damage and assess if damage is reversible, they may request you to run dd against the problematic volume and/or VOMA if you have vSphere 5.1 or later.

Refer below  KB articles for more information:

http://kb.vmware.com/kb/1020645

http://kb.vmware.com/kb/2036767

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s