2016
-
05
-
17T11
:
50
:
03.390Z
cpu0
:
33401
)
NMP
:
nmp_ThrottleLogForDevice
:
3178
:
Cmd
0x28
(
0x43a58090dec0
,
0
)
to
dev
"naa.5000c50083f4ccaf"
on
path
"vmhba2:C0:T11:L0"
Failed
:
H
:
0x0
D
:
0x2
P
:
0x0
Valid
sense
data
:
0x4
0x32
0x0.
Act
:
NONE
The
SCSi
sense
code
&
nbsp
;
H
:
0x0
D
:
0x2
P
:
0x0
Valid
sense
data
:
0x4
0x32
0x0
decoded
is
:
ASC
/
ASCQ
:
32h
/
00h
ASC
Details
:
NO
DEFECT
SPARE
LOCATION
AVAILABLE
Status
:
02h
-
CHECK
CONDITION
Description
:
&
nbsp
;
When
the
target
returns
a
Check
Condition
in
response
to
a
command
it
is
indicating
that
it
has
entered
a
contingent
allegiance
condition
.
&
nbsp
;
This
means
that
an
error
occurred
when
it
attempted
to
execute
a
SCSI
command
.
The
initiator
usually
then
issues
a
SCSI
Request
Sense
command
in
order
to
obtain
a
Key
Code
Qualifier
(
KCQ
)
from
the
target
.
The status is being returned from the device and indicates a hardware failure despite the iDRAC not reporting any issues. On top of that the ESXi Host Client status of the disk looked normal.
Initially the sense error was being reporting as being be due to a parity error but not a hardware error on the disk meaning that the disk wasn’t going to be replaced. DELL support couldn’t see anything wrong with the disk from the point of view of the FX2s Chassis, Storage Controller or from the ESXi hardware status. DELL got me to install and run the
ESXi PERC command line
onto the hosts (which I found to be a handy utility in it’s own right) which also reporting no physical issues.
As a temporary measure I removed the disk from the disk group and brought the host back out of maintenance mode to allow me to retain cluster resiliency. Later on I updated the
Storage Controller Driver and Firmware to the current supported version
and after that I added the disk back into the VSAN Disk Group and then cloned a VM onto that host ensuring that data was placed on the hosts disk groups. About 5 minutes after copy the host has flagged and this time the disk has been marked with a permanent disk failure.