Skip to main content

14.2.9 Checking for a Failed Disk Drive


14.2.9 Checking for a Failed Disk Drive
This section describes how to check for a failure in the internal disk drives that make up a RAID volume.
You can confirm faulty disk drives using any of the methods or combination of the methods listed below.
Checking the LEDs of Disk Drives
If a failure occurs in any of the disk drives in the system, the CHECK LED (amber) on the front of that disk drive goes on. With this CHECK LED, you can identify the disk drive where the failure occurred in the system. For the locations and detailed descriptions of these LEDs, see "Understanding the System Components" in the Service Manual for your server.
Checking an Error Message
If a failure occurs in a disk drive, the console screen displays an error message. You can also check for such a message by opening the /var/adm/messages file.
This procedure describes how to check for the slot of a failed disk drive within the hardware RAID volume or a failed hot spare disk drive.
  1. Check the DevHandle value of the failed disk drive.
    In the following example of an error message, the value is "11" (DevHandle 0x11).
Jun 10 16:33:33 A4U4S429-D0 scsi: /pci@8000/pci@4/pci@0/pci@0/scsi@0 (mpt_sas0):
Jun 10 16:33:33 A4U4S429-D0 PhysDiskNum 1 with DevHandle 0x11 in slot 0 for enclosure with handle 0x0 is now offline
Jun 10 16:33:33 A4U4S429-D0 scsi: /pci@8000/pci@4/pci@0/pci@0/scsi@0 (mpt_sas0):
Jun 10 16:33:33 A4U4S429-D0 PhysDiskNum 1 with DevHandle 0x11 in slot 0 for enclosure with handle 0x0 is now , active, out of sync, write cache enabled
Jun 10 16:33:33 A4U4S429-D0 scsi: /pci@8000/pci@4/pci@0/pci@0/scsi@0 (mpt_sas0):
Jun 10 16:33:33 A4U4S429-D0 PhysDiskNum 1 with DevHandle 0x11 in slot 0 for enclosure with handle 0x0 is now , active, out of sync
Jun 10 16:33:33 A4U4S429-D0 scsi: WARNING: /pci@8000/pci@4/pci@0/pci@0/scsi@0
(mpt_sas0):
Jun 10 16:33:33 A4U4S429-D0 Volume 286 is degraded
Jun 10 16:33:33 A4U4S429-D0 scsi: /pci@8000/pci@4/pci@0/pci@0/scsi@0 (mpt_sas0):
Jun 10 16:33:33 A4U4S429-D0 Volume 0 is now degraded
Jun 10 16:33:33 A4U4S429-D0 scsi: WARNING: /pci@8000/pci@4/pci@0/pci@0/scsi@0
(mpt_sas0):
Jun 10 16:33:33 A4U4S429-D0 Volume 286 is degraded
Jun 10 16:33:33 A4U4S429-D0 scsi: /pci@8000/pci@4/pci@0/pci@0/scsi@0 (mpt_sas0):
Jun 10 16:33:33 A4U4S429-D0 Volume 0 is now , enabled, active, data scrub in progress Jun 10 16:33:33 A4U4S429-D0 scsi: WARNING: /pci@8000/pci@4/pci@0/pci@0/scsi@0 (mpt_sas0):
Jun 10 16:33:33 A4U4S429-D0 Volume 286 is degraded
Jun 10 16:33:33 A4U4S429-D0 scsi: /pci@8000/pci@4/pci@0/pci@0/scsi@0 (mpt_sas0):
Jun 10 16:33:33 A4U4S429-D0 Volume 0 is now , enabled, active, data scrub in progress
  1. Check the output results from the show-children command, which were recorded beforehand.
    The PhyNum value of the Target value matching the DevHandle value obtained from the check in step (1) indicates the slot of the failed disk drive.
    In the following example, the PhyNum value of Target 11 is 7, so it can be determined that the failed disk drive is in slot 7.
{0} ok show-children
:
Omitted
:
Target 11
Unit 0 Disk TOSHIBA MBF2600RC 3706
1172123568 Blocks, 600 GB
SASDeviceName 50000393c813ae72 SASAddress 50000393c813ae74
PhyNum 7
:
Omitted
:
Note - A change of the disk drive mounting status also changes the DevHandle value.
Displaying the Status With the FCode Utility
To check for a disk drive failure, stop the system, and use the show-volumes command of the FCode utility command.

For details, see "14.2.8 Checking the Status of a Hardware RAID Volume and a Disk Drive."
Displaying the status with the SAS2IRCU utility
You can also use the SAS2IRCU utility to check for a disk drive failure.
For details on the SAS2IRCU utility, see the beginning of "14.2 Configuring Hardware RAID." Also, for the display status with SAS2IRCU utility, see "14.2.8 Checking the Status of a Hardware RAID Volume and a Disk Drive" and for display examples, see "Appendix F SAS2IRCU Utility Command Examples."