Skip to main content

A.6.3 Example of the Active Replacement Procedure


A.6.3 Example of the Active Replacement Procedure
This section describes an example of the procedure for actively replacing BB#01 using PPAR DR for the 2BB configuration system described in "Figure A-8 Configuration Example of a 2BB Configuration Where All the Resources are Assigned." The example is for an environment where dynamic PCIe bus assignment is available (Oracle VM Server for SPARC 3.2 or later).
This description also applies to SPARC M12.
Note - If the XSCF in SPARC M10-4S to be actively replaced is defective, you cannot perform active replacement using PPAR DR.
You must stop the physical partition to which the SPARC M10-4S to be actively replaced belongs and then perform maintenance with the input power to the SPARC M10-4S to be replaced turned off.
  1. Log in to the master XSCF.
    Execute the showbbstatus command to check that the XSCF to which you have logged in is the master XSCF.
    If you have logged in to a standby XSCF, log out and then log in to the master XSCF again.
XSCF> showbbstatus
BB#00 (Master)
  1. Execute the showhardconf command to check that [Status] of the XSCF in the SPARC M10-4S to be replaced is "Normal."
XSCF> showhardconf
SPARC M10-4S;
+ Serial: 2081230011; Operator_Panel_Switch:Locked;
+ System_Power:On; System_Phase:Cabinet Power On;
Partition#0 PPAR_Status:Running;
BB#00 Status:Normal; Role:Master; Ver:2003h; Serial:2081231002;
+ FRU-Part-Number: CA07361-D202 A1 ;
+ Power_Supply_System: ;
+ Memory_Size:256 GB;
CMUL Status:Normal; Ver:0101h; Serial:PP1236052K ;
+ FRU-Part-Number:CA07361-D941 C4 /7060911 ;
+ Memory_Size:128 GB; Type: A ;
CPU#0 Status:Normal; Ver:4142h; Serial:00322658;
+ Freq:3.000 GHz; Type:0x10;
+ Core:16; Strand:2;

BB#01 Status:Normal; Role:Standby; Ver:0101h;Serial:7867000297;
+ FRU-Part-Number: CA20393-B50X A2 ;
+ Power_Supply_System: ;
+ Memory_Size:256 GB;
CMUL Status:Normal; Ver:0101h; Serial:PP123406CB ;
+ FRU-Part-Number:CA07361-D941 C4 /7060911 ;
+ Memory_Size:128 GB; Type: A ;
Note - If the XSCF in SPARC M10-4S to be actively replaced is defective, you cannot perform active replacement using PPAR DR.
You must stop the physical partition to which the SPARC M10-4S to be actively replaced belongs and then perform maintenance with the input power to the SPARC M10-4S to be replaced turned off.
  1. Execute the showbbstatus command to confirm that the XSCF in the SPARC M10-4S to be replaced is not the master XSCF.
XSCF> showbbstatus
BB#00 (Master)
  1. If the SPARC M10-4S to be replaced is the master XSCF, execute the switchscf command to switch the XSCF.
XSCF> switchscf -t Standby
The XSCF unit switch between the Active and Standby states.
Continue? [y|n] :y
Note - Confirm that the XSCF has been switched and rebooted before you release the SPARC M10-4S.
  1. Execute the console command to connect to the console of the control domain and then log in to it.
XSCF> console -p 0
  1. Release the redundant configuration of the system volume and physical I/O devices in the control domain.
    Release the physical I/O devices of the SPARC M10-4S (on BB#01) to be replaced, that are used in the control domain. For details on the procedure for canceling a redundant configuration, see the document for the software for that redundant configuration.
  1. a. Cancel the redundant configuration of the system volume in the control domain.

    The following example describes how to cancel the ZFS mirroring function for the system volume in the control domain.

    Execute the zpool status command in the control domain to check the mirroring configuration status.
# zpool status rpool
pool: rpool
state: ONLINE
scan: resilvered 28.7M in 0h0m with 0 errors on Tue Jan 21 10:10:01 2014
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t50000393E802CCE2d0s0 ONLINE 0 0 0
c3t50000393A803B13Ed0s0 ONLINE 0 0 0
errors: No known data errors
  1. Execute the zpool detach command to release the disk from the mirroring configuration.
# zpool detach rpool c3t50000393A803B13Ed0
  1. Execute the zpool status command to confirm that the mirroring configuration has been canceled.
# zpool status rpool
pool: rpool
state: ONLINE
scan: resilvered 28.7M in 0h0m with 0 errors on Tue Jan 21 10:10:01 2014
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t50000393E802CCE2d0s0 ONLINE 0 0 0
errors: No known data errors
  1. If you are using other devices in BB#01, remove the redundant configuration or stop using those devices. For details on how to cancel a redundant configuration or stop using the devices, see the documentation for the software for that redundant configuration and Oracle Solaris.
b. Cancel the redundant configuration of the network of the control domain.

Execute the ipmpstat -i command to check the configuration information for the network interfaces configuring IPMP.

# ipmpstat -i
INTERFACE ACTIVE GROUP FLAGS LINK PROBE STATE
net0 yes ipmp0 -smbM-- up disabled ok
net4 no ipmp0 is----- up disabled ok
Execute the if_mpadm -d command to release net4 from the IPMP group, and then execute the ipmpstat -i command to confirm that it has been released. The following example confirms that STATE is offline.
# if_mpadm -d net4
# ipmpstat -i
INTERFACE ACTIVE GROUP FLAGS LINK PROBE STATE
net0 yes ipmp0 -smbM-- up disabled ok
net4 no ipmp0 -s---d- up disabled offline
Execute the ipadm delete-ip command to delete net4.
# ipadm delete-ip net4
  1. Release the system board (PSB<BB>) of the SPARC M10-4S from the physical partition.
  1. a. Execute the deleteboard -c disconnect command to release the PSB from the physical partition.

    If you have not manually released any of CPU core, memory, and PCIe root complex resources in advance, be sure to specify the "-m unbind=resource" option. When this option is specified, the resources are automatically deleted, and then the PSB is released.
XSCF> deleteboard -c disconnect -m unbind=resource 01-0
PSB#01-0 will be unconfigured from PPAR immediately.
Continue?[y|n] :y
Start unconfigure preparation of PSB. [1200sec]
0end
Unconfigure preparation of PSB has completed.
Start unconfiguring PSB from PPAR. [7200sec]
0..... 30..... 60....end

Unconfigured PSB from PPAR.
PSB power off sequence started. [1200sec]
0..... 30..... 60..... 90.....120.....150.....end

Operation has completed.
  1. b. Execute the showresult command to check the exit status of the deleteboard command that was just executed.

    An end value of 0 indicates the normal termination of the deleteboard command.

    If the end value is other than 0 or if an error message is displayed upon executing the deleteboard command, it indicates abnormal termination of the deleteboard command. By referring to "C.1.2 deleteboard" based on the error message, identify the error and then take corrective action.
XSCF> showresult
0
  1. c. Execute the showboards command to check the PSB status.

    Confirm that the PSB in the SPARC M10-4S to be replaced is in the "Assigned" state and that the [Pwr], [Conn], and [Conf] columns all show "n."
XSCF> showboards -p 0
PSB PPAR-ID(LSB) Assignment Pwr Conn Conf Test Fault
---- ------------ ----------- ---- ---- ---- ------- --------

00-0 00(00) Assigned y y y Passed Normal
01-0 00(01) Assigned n n n Passed Normal
  1. Execute the replacefru command to replace the SPARC M10-4S.
XSCF> replacefru
Note - For details on the replacement of SPARC M10-4Ss by using the replacefru command, see "5.8 Releasing a SPARC M10-4/M10-4S FRU from the System with the replacefru Command" and "6.2 Incorporating a SPARC M10-4/M10-4S FRU into the System with the replacefru Command" in the Fujitsu M10-4/Fujitsu M10-4S/SPARC M10-4/SPARC M10-4S Service Manual.
  1. Incorporate the PSB into the physical partition.
  1. a. Execute the showboards command to check the PSB status.

    Confirm that the PSB in the replaced SPARC M10-4S is in the Assigned state and that the [Pwr], [Conn], and [Conf] columns all show "n."
XSCF> showboards -p 0
PSB PPAR-ID(LSB) Assignment Pwr Conn Conf Test Fault
---- ------------ ----------- ---- ---- ---- ------- --------

00-0 00(00) Assigned y y y Passed Normal
01-0 00(01) Assigned n n n Passed Normal
  1. b. Execute the addboard -c configure command to incorporate the PSB into the physical partition.

    To recover the original logical domain configuration, execute the addboard -c configure command with the -m bind=resource option specified.
XSCF> addboard -c configure -m bind=resource -p 0 01-0
PSB#01-0 will be configured into PPAR-ID 0. Continue?[y|n] :y
Start connecting PSB to PPAR. [3600sec]
0..... 30..... 60..... 90.....120.....150.....180.....210.....240.....

270.....300.....330.....360.....390.....420.....450.....480.....510.....

540.....570.....600.....630.....660.....690.....720.....750.....780.....

810.....840.....870.....900.....930.....960.....end

Connected PSB to PPAR.
Start configuring PSB to Logical Domains (LDoms) Manager.
[1800sec] 0.....end
Configured PSB to Logical Domains (LDoms) Manager.
Operation has completed.
Note - If an error message appears during execution of the addboard command, see "C.1.1 addboard," and then identify the error and take corrective action.
  1. c. Execute the showresult command to check the exit status of the addboard command that was just executed.

    An end value of 0 indicates the normal termination of the addboard command.

    If the end value is other than 0 or if an error message is displayed upon executing the addboard command, it indicates abnormal termination of the addboard command. By referring to "C.1.1 addboard" based on the error message, identify the error and then take corrective action.
XSCF> showresult
0
  1. d. Execute the showboards command to check the PSB status.

    Confirm that both of the [Conn] and [Conf] columns show "y" after the PSB in the replaced SPARC M10-4S has been successfully incorporated.
XSCF> showboards -p 0
PSB  PPAR-ID(LSB) Assignment  Pwr  Conn Conf Test    Fault
---- ------------ ----------- ---- ---- ---- ------- --------

00-0 00(00)       Assigned    y    y    y    Passed  Normal
01-0 00(01)       Assigned    y    y    y    Passed  Normal
  1. Restore the system volume and physical I/O devices on the control domain to a redundant configuration.
  1. a. Place the system volume in the control domain in a redundant configuration.

    Execute the zpool status command in the control domain to check the mirroring configuration status.

    The following example describes how to configure the ZFS mirroring function for the system volume in the control domain.
# zpool status rpool
pool: rpool
state: ONLINE
scan: resilvered 29.1M in 0h0m with 0 errors on Thu Jan 23 17:27:59 2014
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c2t50000393E802CCE2d0s0 ONLINE 0 0 0
errors: No known data errors
  1. Execute the zpool attach command to incorporate the disks into a mirroring configuration.
# zpool attach rpool c2t50000393E802CCE2d0s0 c3t50000393A803B13Ed0s0
Make sure to wait until resilver is done before rebooting.
#
  1. Execute the zpool status command, and then confirm that the mirroring configuration has been established.

    Use the zpool status command to confirm whether synchronization processing (resilver) is completed.

    The following shows an example of the display during synchronization processing.
# zpool status rpool
pool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function in a degraded state.
action: Wait for the resilver to complete.
Run 'zpool status -v' to see device specific details.
scan: resilver in progress since Mon Jan 27 15:55:47 2014
21.1G scanned out of 70.6G at 120M/s, 0h7m to go
21.0G resilvered, 29.84% done
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c2t50000393E802CCE2d0s0 ONLINE 0 0 0
c3t50000393A803B13Ed0s0 DEGRADED 0 0 0 (resilvering)
errors: No known data errors
  1. Once synchronization processing is complete, the displayed screen will be as follows:
# zpool status rpool
pool: rpool
state: ONLINE
scan: resilvered 70.6G in 0h9m with 0 errors on Mon Jan 27 16:05:34 2014
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t50000393E802CCE2d0s0 ONLINE 0 0 0
c3t50000393A803B13Ed0s0 ONLINE 0 0 0
errors: No known data errors
  1. If you are using other devices in BB#01, establish a redundant configuration or resume the use of the devices. For details on how to establish a redundant configuration or resume the use of devices, see the documentation for the software for that redundant configuration and Oracle Solaris.