4.2.2 Identifying a Failure
4.2.2 Identifying a Failure
This section describes how to identify a failure. Use the troubleshooting flow described in "4.2.1 Determining the Causes of Failures" to determine the appropriate way of checking for a failure.
Checking the LED Indications
Check the LEDs on the operation panel, rear panel, and each component to identify the FRU requiring maintenance. Check the status of a FRU from its LED before starting maintenance work on the FRU.
- Operation panel LEDs
You can determine the status of the system by checking the LEDs on the operation panel. For details, see "2.3.1 Operation Panel LEDs." - Rear panel LED
You can determine the status of the system by checking the CHECK LED on the rear panel of the chassis, which duplicates the CHECK LED on the operation panel. For details, see "2.3.2 LEDs on the Rear Panel (System Locator)." - LED of each FRU
If an error occurs in the hardware in the chassis, you can determine the location of the error by checking the LED of the FRU that incorporates the failed hardware. For details, see "2.3.3 LEDs on Each Component."
Note that some FRUs, such as memory, do not have mounted LEDs. To check the status of a FRU that does not have an LED, execute XSCF shell commands such as the showhardconf command from the maintenance terminal. For details, see "Checking the FRU Status" below.
Checking Error Messages
Display error messages to check the log information and obtain an error overview.
You can use either of the following two methods to check the error messages:
You can use either of the following two methods to check the error messages:
- Checking error log information using the XSCF shell
For details, see "12.1 Checking a Log Saved by the XSCF" in the Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 System Operation and Administration Guide. - Checking messages with Oracle Solaris
For details, see "12.2 Checking Warning and Notification Messages" in the Fujitsu SPARC M12 and Fujitsu M10/SPARC M10 System Operation and Administration Guide.
Checking the FRU Status
Execute XSCF firmware commands to determine the system hardware configuration and the status of each FRU.
- showhardconf command
Execute the showhardconf command to check the information on the FRU list.
- Log in to the XSCF shell.
- Execute the showhardconf command to check the FRU list.
A faulty FRU is indicated by an asterisk (*) at the beginning of the line.
The following example shows execution on the SPARC M10-4S.
XSCF> showhardconf SPARC M10-4S; + Serial:2081230011; Operator_Panel_Switch:Locked; + System_Power:On; System_Phase:Cabinet Power On; Partition#0 PPAR_Status:Powered Off; Partition#1 PPAR_Status:Initialization Phase; BB#00 Status:Normal; Role:Slave; Ver:2003h; Serial:2081231002; + FRU-Part-Number:CA07361-D202 A1 ; + Power_Supply_System:Single; + Memory_Size:256 GB; ------------------------Omitted------------------------ XBBOX#80 Status:Normal; Role:Master; Ver:0101h; Serial:7867000297; + FRU-Part-Number:CA07361-D011 A0 /NOT-FIXD-01 ; + Power_Supply_System:Single; XBU#0 Status:Normal; Serial:PP0629L068 + FRU-Part-Number:CA20393-B50X A2 ; + Type: A ; * CBL#L0 Status:Degraded; + FRU-Part-Number:2123628-2 ; Ver:3820h; + Type:Optic; Length: 3; + FRU-Part-Number:2123628-2 ; Ver:3820h; + Type:Optic; Length: 3; ------------------------Omitted---------------------- |
- showstatus command
Execute the showstatus command to check the FRU status.
- Log in to the XSCF shell.
- Execute the showstatus command to check the status.
A faulty FRU is indicated by an asterisk (*) at the beginning of the line.
XSCF> showstatus XBBOX#80; * PSU#0 Status:Faulted; |
The FRU status is displayed after the "Status:" string.
Table 4-3 describes the FRU status.
Table 4-3 describes the FRU status.
Display | Description |
---|---|
Normal | The unit is in the normal state. |
Faulted | The unit is faulty and is not operating. |
Degraded | A part of the unit has failed or degraded, but the unit is running. |
Deconfigured | Due to the failure or degradation of another unit, the target unit and components of its underlying layer has been degraded, though there is no problem with them. |
Maintenance | Maintenance is being performed. The replacefru, addfru, or initbb command is being executed. |
Checking Log Information
Execute the showlogs command to check error log information.
- Log in to the XSCF shell.
- Execute the showlogs command to check the log information.
The log information is listed in order of date, with the oldest appearing first.
The following example shows that an Alarm status occurred in PSU#1 at 12:45:31 on Oct 20, and the Alarm status changed to a Warning status at 15:45:31 on the same day.
XSCF> showlogs error Date: Oct 20 12:45:31 JST 2012 Code: 00112233-445566778899aabbcc-8899aabbcceeff0011223344 Status: Alarm Occurred: Oct 20 12:45:31.000 JST 2012 FRU: /PSU#1 Msg: ACFAIL occurred (ACS=3)(FEP type = A1) Date: Oct 20 15:45:31 JST 2012 Code: 00112233-445566778899aabbcc-8899aabbcceeff0011223344 Status: Warning Occurred: Oct 20 15:45:31.000 JST 2012 FRU: /PSU#1 Msg: ACFAIL occurred (ACS=3)(FEP type = A1) |
Table 4-4 shows what log information each operand of the showlogs command can display.
Operand | Description |
---|---|
error | Lists the error log. |
event | Lists the event log. |
power | Lists the power log. |
env | Lists the temperature history. |
monitor | Lists the monitoring message log. |
console | Lists the console message log. |
ipl | Lists the IPL message log. |
panic | Lists the panic message log. |
Checking the Messages Output by the Predictive Self-Repairing Tool
Check the messages output from the Oracle Solaris Fault Manager predictive self-repairing tool, running on Oracle Solaris. Oracle Solaris Fault Manager has the following functions:
- Receives telemetry information about errors.
- Performs troubleshooting.
- Disables the FRUs that have experienced errors.
- Turns on the LED of a FRU that has experienced an error and displays the details in a system console message.
Table 4-5 lists typical messages that are generated if an error occurs. These messages indicate that the fault has already been diagnosed. If there are corrective actions that can be taken by the system, they have already been taken. In addition, if the system is running, corrective actions continue to be applied.
Messages are displayed on the console and are recorded in the /var/adm/messages file.
Messages are displayed on the console and are recorded in the /var/adm/messages file.
Output Displayed | Description |
---|---|
EVENT-TIME: Thu Apr 19 10:48:39 JST 2012 | EVENT-TIME: Time stamp of the diagnosis |
PLATFORM: ORCL,SPARC64-X, CSN: PP115300MX, HOSTNAME: 4S-LGA12-D0 | PLATFORM: Description of the chassis in which the error occurred |
SOURCE: eft, REV: 1.16 | SOURCE: Information regarding the diagnosis engine used to identify the error |
EVENT-ID: fcbb42a5-47c3-c9c5-f0b0-f782d69afb01 | EVENT-ID: Universally unique event ID for this error |
DESC: The diagnosis engine encountered telemetry from the listed devices for which it was unable to perform a diagnosis - ereport.io.pciex.rc.epkt@chassis0/cpuboard0/chip0/hostbridge0/pciexrc0 class and path are incompatible. | DESC: Basic description of the error |
AUTO-RESPONSE: Error reports have been logged for examination. | AUTO-RESPONSE: What the system has done (if anything) to alleviate any subsequent problems |
IMPACT: Automated diagnosis and response for these events will not occur. | IMPACT: Description of the assumed impact of the failure |
REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Use 'fmdump -eV' to view the unexpected telemetry. Please refer to the associated reference document at http://support.oracle.com/msg/SUNOS-8000-J0 for the latest service procedures and policies regarding this diagnosis. | REC-ACTION: Brief description of the corrective action the system administrator should apply |
Identifying the Location of the Chassis Requiring Maintenance
Execute the setlocator command to identify the location of the chassis requiring maintenance by causing the CHECK LED on the operation panel and the CHECK LED (locator) on the rear panel to blink.
- Log in to the XSCF shell.
- Execute the setlocator command to blink the CHECK LED of the chassis requiring maintenance, and determine its location.
The CHECK LEDs on the operation and rear panels blink.
- The chassis requiring maintenance in the following execution example is the master chassis.
XSCF> setlocator blink |
- If the chassis requiring maintenance is not the master chassis, set "setlocator -b bb_id blink".
- For details on where to find and how to check the CHECK LEDs, see "2.3 Checking the LED Indications."
< Previous Page | Next Page >