A.6.3 Example of the Active Replacement Procedure
A.6.3 Example of the Active Replacement Procedure
This section describes an example of the procedure for actively replacing BB#01 using PPAR DR for the 2BB configuration system described in "Figure A-8 Configuration Example of a 2BB Configuration Where All the Resources are Assigned." The example is for an environment where dynamic PCIe bus assignment is available (Oracle VM Server for SPARC 3.2 or later).
This description also applies to SPARC M12.
Note - If the XSCF in SPARC M10-4S to be actively replaced is defective, you cannot perform active replacement using PPAR DR. You must stop the physical partition to which the SPARC M10-4S to be actively replaced belongs and then perform maintenance with the input power to the SPARC M10-4S to be replaced turned off. |
- Log in to the master XSCF.
Execute the showbbstatus command to check that the XSCF to which you have logged in is the master XSCF.
If you have logged in to a standby XSCF, log out and then log in to the master XSCF again.
XSCF> showbbstatus BB#00 (Master) |
- Execute the showhardconf command to check that [Status] of the XSCF in the SPARC M10-4S to be replaced is "Normal."
XSCF> showhardconf SPARC M10-4S; + Serial: 2081230011; Operator_Panel_Switch:Locked; + System_Power:On; System_Phase:Cabinet Power On; Partition#0 PPAR_Status:Running; BB#00 Status:Normal; Role:Master; Ver:2003h; Serial:2081231002; + FRU-Part-Number: CA07361-D202 A1 ; + Power_Supply_System: ; + Memory_Size:256 GB; CMUL Status:Normal; Ver:0101h; Serial:PP1236052K ; + FRU-Part-Number:CA07361-D941 C4 /7060911 ; + Memory_Size:128 GB; Type: A ; CPU#0 Status:Normal; Ver:4142h; Serial:00322658; + Freq:3.000 GHz; Type:0x10; + Core:16; Strand:2; : BB#01 Status:Normal; Role:Standby; Ver:0101h;Serial:7867000297; + FRU-Part-Number: CA20393-B50X A2 ; + Power_Supply_System: ; + Memory_Size:256 GB; CMUL Status:Normal; Ver:0101h; Serial:PP123406CB ; + FRU-Part-Number:CA07361-D941 C4 /7060911 ; + Memory_Size:128 GB; Type: A ; : |
Note - If the XSCF in SPARC M10-4S to be actively replaced is defective, you cannot perform active replacement using PPAR DR. You must stop the physical partition to which the SPARC M10-4S to be actively replaced belongs and then perform maintenance with the input power to the SPARC M10-4S to be replaced turned off. |
- Execute the showbbstatus command to confirm that the XSCF in the SPARC M10-4S to be replaced is not the master XSCF.
XSCF> showbbstatus BB#00 (Master) |
- If the SPARC M10-4S to be replaced is the master XSCF, execute the switchscf command to switch the XSCF.
XSCF> switchscf -t Standby The XSCF unit switch between the Active and Standby states. Continue? [y|n] :y |
Note - Confirm that the XSCF has been switched and rebooted before you release the SPARC M10-4S. |
- Execute the console command to connect to the console of the control domain and then log in to it.
XSCF> console -p 0 |
- Release the redundant configuration of the system volume and physical I/O devices in the control domain.
Release the physical I/O devices of the SPARC M10-4S (on BB#01) to be replaced, that are used in the control domain. For details on the procedure for canceling a redundant configuration, see the document for the software for that redundant configuration.
- a. Cancel the redundant configuration of the system volume in the control domain.The following example describes how to cancel the ZFS mirroring function for the system volume in the control domain.Execute the zpool status command in the control domain to check the mirroring configuration status.
# zpool status rpool pool: rpool state: ONLINE scan: resilvered 28.7M in 0h0m with 0 errors on Tue Jan 21 10:10:01 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t50000393E802CCE2d0s0 ONLINE 0 0 0 c3t50000393A803B13Ed0s0 ONLINE 0 0 0 errors: No known data errors |
- Execute the zpool detach command to release the disk from the mirroring configuration.
# zpool detach rpool c3t50000393A803B13Ed0 |
- Execute the zpool status command to confirm that the mirroring configuration has been canceled.
# zpool status rpool pool: rpool state: ONLINE scan: resilvered 28.7M in 0h0m with 0 errors on Tue Jan 21 10:10:01 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t50000393E802CCE2d0s0 ONLINE 0 0 0 errors: No known data errors |
- If you are using other devices in BB#01, remove the redundant configuration or stop using those devices. For details on how to cancel a redundant configuration or stop using the devices, see the documentation for the software for that redundant configuration and Oracle Solaris.
b. Cancel the redundant configuration of the network of the control domain.
Execute the ipmpstat -i command to check the configuration information for the network interfaces configuring IPMP.
# ipmpstat -i INTERFACE ACTIVE GROUP FLAGS LINK PROBE STATE net0 yes ipmp0 -smbM-- up disabled ok net4 no ipmp0 is----- up disabled ok |
Execute the if_mpadm -d command to release net4 from the IPMP group, and then execute the ipmpstat -i command to confirm that it has been released. The following example confirms that STATE is offline.
# if_mpadm -d net4 # ipmpstat -i INTERFACE ACTIVE GROUP FLAGS LINK PROBE STATE net0 yes ipmp0 -smbM-- up disabled ok net4 no ipmp0 -s---d- up disabled offline |
Execute the ipadm delete-ip command to delete net4.
# ipadm delete-ip net4 |
- Release the system board (PSB<BB>) of the SPARC M10-4S from the physical partition.
- a. Execute the deleteboard -c disconnect command to release the PSB from the physical partition.If you have not manually released any of CPU core, memory, and PCIe root complex resources in advance, be sure to specify the "-m unbind=resource" option. When this option is specified, the resources are automatically deleted, and then the PSB is released.
XSCF> deleteboard -c disconnect -m unbind=resource 01-0 PSB#01-0 will be unconfigured from PPAR immediately. Continue?[y|n] :y Start unconfigure preparation of PSB. [1200sec] 0end Unconfigure preparation of PSB has completed. Start unconfiguring PSB from PPAR. [7200sec] 0..... 30..... 60....end Unconfigured PSB from PPAR. PSB power off sequence started. [1200sec] 0..... 30..... 60..... 90.....120.....150.....end Operation has completed. |
- b. Execute the showresult command to check the exit status of the deleteboard command that was just executed.An end value of 0 indicates the normal termination of the deleteboard command.If the end value is other than 0 or if an error message is displayed upon executing the deleteboard command, it indicates abnormal termination of the deleteboard command. By referring to "C.1.2 deleteboard" based on the error message, identify the error and then take corrective action.
XSCF> showresult 0 |
- c. Execute the showboards command to check the PSB status.Confirm that the PSB in the SPARC M10-4S to be replaced is in the "Assigned" state and that the [Pwr], [Conn], and [Conf] columns all show "n."
XSCF> showboards -p 0 PSB PPAR-ID(LSB) Assignment Pwr Conn Conf Test Fault ---- ------------ ----------- ---- ---- ---- ------- -------- 00-0 00(00) Assigned y y y Passed Normal 01-0 00(01) Assigned n n n Passed Normal |
- Execute the replacefru command to replace the SPARC M10-4S.
XSCF> replacefru |
Note - For details on the replacement of SPARC M10-4Ss by using the replacefru command, see "5.8 Releasing a SPARC M10-4/M10-4S FRU from the System with the replacefru Command" and "6.2 Incorporating a SPARC M10-4/M10-4S FRU into the System with the replacefru Command" in the Fujitsu M10-4/Fujitsu M10-4S/SPARC M10-4/SPARC M10-4S Service Manual. |
- Incorporate the PSB into the physical partition.
- a. Execute the showboards command to check the PSB status.Confirm that the PSB in the replaced SPARC M10-4S is in the Assigned state and that the [Pwr], [Conn], and [Conf] columns all show "n."
XSCF> showboards -p 0 PSB PPAR-ID(LSB) Assignment Pwr Conn Conf Test Fault ---- ------------ ----------- ---- ---- ---- ------- -------- 00-0 00(00) Assigned y y y Passed Normal 01-0 00(01) Assigned n n n Passed Normal |
- b. Execute the addboard -c configure command to incorporate the PSB into the physical partition.To recover the original logical domain configuration, execute the addboard -c configure command with the -m bind=resource option specified.
XSCF> addboard -c configure -m bind=resource -p 0 01-0 PSB#01-0 will be configured into PPAR-ID 0. Continue?[y|n] :y Start connecting PSB to PPAR. [3600sec] 0..... 30..... 60..... 90.....120.....150.....180.....210.....240..... 270.....300.....330.....360.....390.....420.....450.....480.....510..... 540.....570.....600.....630.....660.....690.....720.....750.....780..... 810.....840.....870.....900.....930.....960.....end Connected PSB to PPAR. Start configuring PSB to Logical Domains (LDoms) Manager. [1800sec] 0.....end Configured PSB to Logical Domains (LDoms) Manager. Operation has completed. |
Note - If an error message appears during execution of the addboard command, see "C.1.1 addboard," and then identify the error and take corrective action. |
- c. Execute the showresult command to check the exit status of the addboard command that was just executed.An end value of 0 indicates the normal termination of the addboard command.If the end value is other than 0 or if an error message is displayed upon executing the addboard command, it indicates abnormal termination of the addboard command. By referring to "C.1.1 addboard" based on the error message, identify the error and then take corrective action.
XSCF> showresult 0 |
- d. Execute the showboards command to check the PSB status.Confirm that both of the [Conn] and [Conf] columns show "y" after the PSB in the replaced SPARC M10-4S has been successfully incorporated.
XSCF> showboards -p 0 PSB PPAR-ID(LSB) Assignment Pwr Conn Conf Test Fault ---- ------------ ----------- ---- ---- ---- ------- -------- 00-0 00(00) Assigned y y y Passed Normal 01-0 00(01) Assigned y y y Passed Normal |
- Restore the system volume and physical I/O devices on the control domain to a redundant configuration.
- a. Place the system volume in the control domain in a redundant configuration.Execute the zpool status command in the control domain to check the mirroring configuration status.The following example describes how to configure the ZFS mirroring function for the system volume in the control domain.
# zpool status rpool pool: rpool state: ONLINE scan: resilvered 29.1M in 0h0m with 0 errors on Thu Jan 23 17:27:59 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c2t50000393E802CCE2d0s0 ONLINE 0 0 0 errors: No known data errors |
- Execute the zpool attach command to incorporate the disks into a mirroring configuration.
# zpool attach rpool c2t50000393E802CCE2d0s0 c3t50000393A803B13Ed0s0 Make sure to wait until resilver is done before rebooting. # |
- Execute the zpool status command, and then confirm that the mirroring configuration has been established.Use the zpool status command to confirm whether synchronization processing (resilver) is completed.The following shows an example of the display during synchronization processing.
# zpool status rpool pool: rpool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function in a degraded state. action: Wait for the resilver to complete. Run 'zpool status -v' to see device specific details. scan: resilver in progress since Mon Jan 27 15:55:47 2014 21.1G scanned out of 70.6G at 120M/s, 0h7m to go 21.0G resilvered, 29.84% done config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c2t50000393E802CCE2d0s0 ONLINE 0 0 0 c3t50000393A803B13Ed0s0 DEGRADED 0 0 0 (resilvering) errors: No known data errors |
- Once synchronization processing is complete, the displayed screen will be as follows:
# zpool status rpool pool: rpool state: ONLINE scan: resilvered 70.6G in 0h9m with 0 errors on Mon Jan 27 16:05:34 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t50000393E802CCE2d0s0 ONLINE 0 0 0 c3t50000393A803B13Ed0s0 ONLINE 0 0 0 errors: No known data errors |
- If you are using other devices in BB#01, establish a redundant configuration or resume the use of the devices. For details on how to establish a redundant configuration or resume the use of devices, see the documentation for the software for that redundant configuration and Oracle Solaris.
< Previous Page | Next Page >