The final (?) note on the ESXi / HP saga (part 1 / part 2). This is too much of a downer to continue documenting!
I’ll start with a quick tip: If moving data to NFS shares seems slow or gives you frequent timeouts look to your network gear.
I was having issues getting ghettoVCB backups to an NFS share on a Windows 2003 server. The VMware ESXi server would sporadically lose connection to the 2k3 server and then kill the backup. I finally replaced the little DLink SOHO 1GB switch with an HP ProCurve and then replaced all the sketchy old network cables with shiny new CAT 6 cables. The backups became noticeably faster and the intermittent connection losses completely disappeared.
Now I can get good backups for 3 out of the 6 Virtual Machines (VMs) on this server. Using any sort of file copy I can get copies of those same 3.
The other three? I’m starting to lose faith – I think we’re hosed. The copies or backups always end with a series of errors. The log errors point to the datastore where the VMs currently reside, not the copy destination. Read errors. Ugh.
Tip: I can’t seem to get “thin” backups to the Windows (or OpenFiler for that mattter) NFS shares. So, regardless of how much data is actually used in that 150 GB virtual disk, I get a full 150 GB backup file. As a workaround, I turned on NTFS compression for the NFS share at the Windows server. Slows the copy speed down by almost half with barely any extra CPU utilization. Worth it though as it took 280 GB of backups down to 16.5 GB!
I have a VMware forum post out there languishing. It did result in me making sure I had the latest/greatest firmware, ESXi updates and HP tools installed though. It also took me down a few unnecessary paths, but that’s OK as it was educational. I’ll probably close that post soon and try a much shorter and summarized version. I may have to figure out how to contact paid support.
I also tried a ServerFault.com post but I think I tried to cover too much territory in it. Face it, many geeks suffer from tl;dr syndrom. Think I’ll close that topic soon as well.
I am trying to get some help from HP now, but this time they’re not so interested in helping. See, at boot time the P400 array controller gives an error 1716 “unrecoverable media error.” HP says, logically enough, that I need to rebuild the array. OK, I’d like to do that but I want image backups first. They say I should’ve had good backups before I did any drive replacements. Well, that’s a good point. I did… but that was almost two weeks ago! *cough* excuse me… I’d just like fresh backups before I toast the array. These machines are all still in use and the company hasn’t been standing still.
There doesn’t appear to be a chkdsk or fsck for vmfs formatted volumes. Seems like that would be useful.
For web searchers dying to share a cure, below I’ve listed some of the error messages.
GhettoVCB errors are:
- Failed to clone disk : Connection timed out (7208969)
- Failed to clone disk : Input/output error (327689)
Sample message log errors:
Jul 15 19:18:50 vmkernel: 0:17:59:18.890 cpu4:16218)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4100051614c0) to NMP device "mpx.vmhba1:C0:T1:L0" failed on physical path "vmhba1:C0:T1:L0" H:0x3 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x0.
Jul 15 19:18:50 vmkernel: 0:17:59:18.890 cpu4:16218)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "mpx.vmhba1:C0:T1:L0" state in doubt; requested fast path state update...
Jul 15 19:18:50 vmkernel: 0:17:59:18.890 cpu4:16218)ScsiDeviceIO: 770: Command 0x28 to device "mpx.vmhba1:C0:T1:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x0.
Jul 15 19:18:53 vmkernel: 0:17:59:22.500 cpu4:5365)<4>cciss: cmd 0x4100b1402000 has CHECK CONDITION byte 2 = 0x3
Here’s another set:
Jul 15 19:49:08 vmkernel: 0:18:29:37.553 cpu7:10409)<4>cciss: cmd 0x4100b1402000 has CHECK CONDITION byte 2 = 0x3
Jul 15 19:49:08 vmkernel: 0:18:29:37.559 cpu7:10409)NMP: nmp_CompleteCommandForPath: Command 0x28 (0x4100050fa000) to NMP device "mpx.vmhba1:C0:T1:L0" failed on physical path "vmhba1:C0:T1:L0" H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Jul 15 19:49:08 vmkernel: 0:18:29:37.559 cpu7:10409)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "mpx.vmhba1:C0:T1:L0" state in doubt; requested fast path state update...
Jul 15 19:49:08 vmkernel: 0:18:29:37.559 cpu7:10409)ScsiDeviceIO: 770: Command 0x28 to device "mpx.vmhba1:C0:T1:L0" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Jul 15 19:49:08 vmkernel: 0:18:29:37.559 cpu6:312122)Fil3: 5354: Sync READ error ('EFops_wsus-flat.vmdk') (ioFlags:
: Timeout
Unless I can figure out a way to get those last 3 VMs images or copied – or an alternative way to fix the read errors – I see a long weekend rebuilding machines in my future. Fortunately I can still get all the data from the VMs. I just can’t copy the VMs directly!
Possibly Related posts:



Recent Comments