A.2.4 Example of the Active Replacement Procedure (When Dynamic Assignment of the PCIe Bus is Available)
A.2.4 Example of the Active Replacement Procedure (When Dynamic Assignment of the PCIe Bus is Available)
This section describes an example of the procedure for actively replacing BB#01 using PPAR DR for the 2BB configuration system described in "Figure A-2 Configuration Example in Which Operations Can Continue in the 2BB Configuration." The example is for an environment where the PCIe bus is assignable dynamically (XCP 2240 or later with Oracle VM Server for SPARC 3.2 or later and the root domain with Oracle Solaris 11.2 SRU11.2.8.4.0 or later).
Note - If the XSCF in SPARC M10-4S to be actively replaced is defective, you cannot perform active replacement using PPAR DR. You must stop the physical partition to which the SPARC M10-4S to be actively replaced belongs and then perform maintenance with the input power to the SPARC M10-4S to be replaced turned off. |
- Log in to the master XSCF.
Execute the showbbstatus command to check that the XSCF to which you have logged in is the master XSCF.
If you have logged in to a standby XSCF, log out and then log in to the master XSCF again.
XSCF> showbbstatus BB#00 (Master) |
- Execute the showhardconf command to check that [Status] of the XSCF in the SPARC M10-4S to be replaced is "Normal."
XSCF> showhardconf SPARC M10-4S; + Serial: 2081230011; Operator_Panel_Switch:Locked; + System_Power:On; System_Phase:Cabinet Power On; Partition#0 PPAR_Status:Running; BB#00 Status:Normal; Role:Master; Ver:2003h; Serial:2081231002; + FRU-Part-Number: CA07361-D202 A1 ; + Power_Supply_System: ; + Memory_Size:256 GB; CMUL Status:Normal; Ver:0101h; Serial:PP1236052K ; + FRU-Part-Number:CA07361-D941 C4 /7060911 ; + Memory_Size:128 GB; Type: A ; CPU#0 Status:Normal; Ver:4142h; Serial:00322658; + Freq:3.000 GHz; Type:0x10; + Core:16; Strand:2; : BB#01 Status:Normal; Role:Standby; Ver:0101h;Serial:7867000297; + FRU-Part-Number: CA20393-B50X A2 ; + Power_Supply_System: ; + Memory_Size:256 GB; CMUL Status:Normal; Ver:0101h; Serial:PP123406CB ; + FRU-Part-Number:CA07361-D941 C4 /7060911 ; + Memory_Size:128 GB; Type: A ; : |
Note - If the XSCF in SPARC M10-4S to be actively replaced is defective, you cannot perform active replacement using PPAR DR. You must stop the physical partition to which the SPARC M10-4S to be actively replaced belongs and then perform maintenance with the input power to the SPARC M10-4S to be replaced turned off. |
- Execute the showbbstatus command to confirm that the XSCF in the SPARC M10-4S to be replaced is not the master XSCF.
XSCF> showbbstatus BB#00 (Master) |
- If the SPARC M10-4S to be replaced is the master XSCF, execute the switchscf command to switch the XSCF.
XSCF> switchscf -t Standby The XSCF unit switch between the Active and Standby states. Continue? [y|n] :y |
Note - Confirm that the XSCF has been switched and rebooted before you release the system board. |
- Execute the console command to connect to the console of the control domain and then log in to it.
XSCF> console -p 0 |
- Check the operation status and resource usage status of the logical domain.
- a. Execute the ldm list-domain command to check the operation status of the logical domain.To check the logical domain operation status, check the [STATE] and [FLAGS] combination. If [STATE] indicates "active", the second character from the left of the string in [FLAGS] has the following meaning."n": Oracle Solaris is operating"t": OpenBoot PROM status"-": In another state (including [STATE] other than "active")The following example shows that the control domain, two root domains, and two guest domains are operating.
- Check whether all domains are in "active", which indicates that Oracle Solaris is in operating state, or "inactive" state. If there is a domain with OpenBoot PROM status or bound status, the dynamic reconfiguration of the physical partition may fail.
# ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- UART 64 56G 0.0% 1h 33m guest0 active -n---- 5100 64 64G 3.1% 2s guest1 active -n---- 5101 64 64G 1.6% 18m root-dom0 active -n--v- 5000 32 32G 3.1% 17m root-dom1 active -n--v- 5001 32 32G 3.1% 17m |
- b. Execute the ldm list-devices command with the -a option specified to check the resource usage status.In the following example, the -a option is specified to display all resources bound to the logical domain and all resources that are not bound.
# ldm list-devices -a CORE ID %FREE CPUSET 0 0 (0, 1) 4 0 (8, 9) 8 0 (16, 17) (Omitted) 944 0 (1888, 1889) 948 0 (1896, 1897) 952 0 (1904, 1905) 956 0 (1912, 1913) VCPU PID %FREE PM 0 0 no 1 0 no 8 0 no 9 0 no (Omitted) 1904 0 no 1905 0 no 1912 0 no 1913 0 no (Omitted) |
- Release the redundant configuration of the system volume and I/O devices in the control domain.
To enable the release of BB#01, release the I/O devices of the SPARC M10-4S to be replaced and which are used in the control domain. For details on the procedure for canceling a redundant configuration, see the document for the software for that redundant configuration.
- a. Cancel the redundant configuration of the system volume in the control domain.The following example describes how to cancel the ZFS mirroring function for the system volume in the control domain.Execute the zpool status command in the control domain to check the mirroring configuration status.
# zpool status rpool pool: rpool state: ONLINE scan: resilvered 28.7M in 0h0m with 0 errors on Tue Jan 21 10:10:01 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t50000393E802CCE2d0s0 ONLINE 0 0 0 c3t50000393A803B13Ed0s0 ONLINE 0 0 0 errors: No known data errors |
- Execute the zpool detach command to release the disk from the mirroring configuration.
# zpool detach rpool c3t50000393A803B13Ed0 |
- Execute the zpool status command to confirm that the mirroring configuration has been canceled.
# zpool status rpool pool: rpool state: ONLINE scan: resilvered 28.7M in 0h0m with 0 errors on Tue Jan 21 10:10:01 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t50000393E802CCE2d0s0 ONLINE 0 0 0 errors: No known data errors |
- If you are using other devices in BB#01, remove the redundant configuration or stop using those devices. For details on how to cancel a redundant configuration or stop using the devices, see the documentation for the software for that redundant configuration and Oracle Solaris.
- b. Delete the I/O configuration of the control domain.From among the physical I/O devices assigned to the control domain, delete the root complexes in BB#01 through dynamic reconfiguration.
- Execute the ldm list-io command to check the root complexes assigned to the primary.The following example shows that the root complexes with BB1 devices are PCIE8 and PCIE12.
# ldm list-io | grep primary PCIE0 BUS PCIE0 primary IOV PCIE4 BUS PCIE4 primary IOV PCIE8 BUS PCIE8 primary IOV PCIE12 BUS PCIE12 primary IOV /BB0/CMUL/NET0 PCIE PCIE0 primary OCC /BB0/CMUL/SASHBA PCIE PCIE0 primary OCC /BB0/CMUL/NET2 PCIE PCIE4 primary OCC /BB1/CMUL/NET0 PCIE PCIE8 primary OCC /BB1/CMUL/SASHBA PCIE PCIE8 primary OCC /BB1/CMUL/NET2 PCIE PCIE12 primary OCC |
- Execute the ldm remove-io command to delete PCIE8 and PCIE12 from the primary.
# ldm remove-io PCIE8 primary # ldm remove-io PCIE12 primary |
- Execute the ldm list-io command to confirm that the root complexes in BB#01 have been deleted from the control domain.
# ldm list-io | grep primary PCIE0 BUS PCIE0 primary IOV PCIE4 BUS PCIE4 primary IOV /BB0/CMUL/NET0 PCIE PCIE0 primary OCC /BB0/CMUL/SASHBA PCIE PCIE0 primary OCC /BB0/CMUL/NET2 PCIE PCIE4 primary OCC |
- c. Cancel the redundant configuration of the virtual I/O devices assigned to a guest domain.To first shut down the root domain (root-dom1) to which the root complexes in BB#01 are assigned, log in to each guest domain, and then cancel the redundant configuration of the virtual I/O device from root-dom1.For details on how to use the redundant configuration software, see the documentation about the software for that redundant configuration.
- In the following example, a virtual network device (vnet1) is canceled from the IPMP configuration. For details on the command, see the manual for Oracle Solaris.Log in to the guest domain (guest0).
# ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- UART 64 56G 0.0% 4h 17m guest0 active -n---- 5100 64 64G 0.0% 1h 13m guest1 active -n---- 5101 64 64G 0.0% 1h 4m root-dom0 active -n--v- 5000 32 32G 0.0% 1h 47m root-dom1 active -n--v- 5001 32 32G 0.0% 1h 19m # telnet localhost 5100 .... guest0# |
- Execute the dladm show-phys command to check the mapping between the virtual network interface (vnet1) and the network interface name (net1).
guest0# dladm show-phys LINK MEDIA STATE SPEED DUPLEX DEVICE net0 Ethernet up 0 unknown vnet0 net1 Ethernet up 0 unknown vnet1 |
- Execute the ipmpstat -i command to check the configuration information for the network interfaces configuring IPMP.
guest0# ipmpstat -i INTERFACE ACTIVE GROUP FLAGS LINK PROBE STATE net0 yes ipmp0 -smbM-- up disabled ok net1 no ipmp0 is----- up disabled ok |
- Execute the if_mpadm -d command to release net1 from the IPMP group, and then execute the ipmpstat -i command to confirm that it has been released. The following example confirms that STATE is offline.
guest1# if_mpadm -d net1 guest1# ipmpstat -i INTERFACE ACTIVE GROUP FLAGS LINK PROBE STATE net0 yes ipmp0 -smbM-- up disabled ok net1 no ipmp0 -s---d- up disabled offline |
- Execute the ipadm delete-ip command to delete net1.
guest0# ipadm delete-ip net1 |
Similarly, perform the same release processing for the guest domain (guest1).
- d. Remove the virtual I/O devices assigned from the root domain to be stopped.Execute the ldm remove-vdisk and ldm remove-vnet commands to remove the assigned virtual disk (vdisk) and virtual network device (vnet) from the root domain to be stopped according to the following step.The following example shows the execution of the commands for removing the virtual disk (vdisk10) and virtual network device (vnet10) that use the virtual I/O service of the BB#01 root domain (root-dom1).
# ldm remove-vdisk vdisk10 guest0 # ldm remove-vnet vnet10 guest0 |
- Perform the same deletion for the guest domain (guest1).
- Check the resource usage status of the I/O devices, and then cancel all the I/O devices in the SPARC M10-4S to be replaced.
- a. Check the logical domain to which the root complexes in the SPARC M10-4S to be released are assigned.Execute the ldm list-io command to check the logical domain to which the root complexes of SPARC M10-4S in BB#01 are assigned.The following example shows that only root-dom1 has PCIe endpoints starting with "/BB1/." You can see that the PCIe endpoint root complexes (BUS) PCIE9, PCIE10, PCIE11, PCIE13, PCIE14, and PCIE15 are assigned to root-dom1.
# ldm list-io NAME TYPE BUS DOMAIN STATUS ---- ---- --- ------ ------ PCIE0 BUS PCIE0 primary IOV PCIE1 BUS PCIE1 root-dom0 IOV PCIE2 BUS PCIE2 root-dom0 IOV PCIE3 BUS PCIE3 root-dom0 IOV PCIE4 BUS PCIE4 primary IOV PCIE5 BUS PCIE5 root-dom0 IOV PCIE6 BUS PCIE6 root-dom0 IOV PCIE7 BUS PCIE7 root-dom0 IOV PCIE8 BUS PCIE8 PCIE9 BUS PCIE9 root-dom1 IOV PCIE10 BUS PCIE10 root-dom1 IOV PCIE11 BUS PCIE11 root-dom1 IOV PCIE12 BUS PCIE12 PCIE13 BUS PCIE13 root-dom1 IOV PCIE14 BUS PCIE14 root-dom1 IOV PCIE15 BUS PCIE15 root-dom1 IOV .... /BB1/CMUL/NET0 PCIE PCIE8 UNK /BB1/CMUL/SASHBA PCIE PCIE8 UNK /BB1/PCI0 PCIE PCIE9 root-dom1OCC /BB1/PCI3 PCIE PCIE10 root-dom1OCC /BB1/PCI4 PCIE PCIE10 root-dom1OCC /BB1/PCI7 PCIE PCIE11 root-dom1OCC /BB1/PCI8 PCIE PCIE11 root-dom1OCC /BB1/CMUL/NET2 PCIE PCIE12 UNK /BB1/PCI1 PCIE PCIE13 root-dom1OCC /BB1/PCI2 PCIE PCIE13 root-dom1OCC /BB1/PCI5 PCIE PCIE14 root-dom1OCC /BB1/PCI6 PCIE PCIE14 root-dom1OCC /BB1/PCI9 PCIE PCIE15 root-dom1OCC /BB1/PCI10 PCIE PCIE15 root-dom1OCC |
- b. Stop the root domain to which the root complexes in SPARC M10-4S to be released are assigned and then release the SPARC M10-4S.The following example executes the ldm stop-domain and ldm unbind-domain commands to release the root domain (root-dom1) and shows that the root domain is in the inactive state.
# ldm stop-domain root-dom1 LDom root-dom1 stopped # ldm unbind-domain root-dom1 # ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- UART 64 56G 0.2% 4h 59m guest0 active -n---- 5100 64 64G 0.0% 1h 55m guest1 active -n---- 5101 64 64G 0.0% 1h 46m root-dom0 active -n--v- 5000 32 32G 0.0% 2h 29m root-dom1 inactive ------ 32 32G |
- c. Confirm that all of the I/O devices in the SPARC M10-4S to be replaced have been canceled.Execute the ldm list-io command to confirm that all the I/O devices have been released.
# ldm list-io NAME TYPE BUS DOMAIN STATUS ---- ---- --- ------ ------ PCIE0 BUS PCIE0 primary IOV PCIE1 BUS PCIE1 root-dom0 IOV PCIE2 BUS PCIE2 root-dom0 IOV PCIE3 BUS PCIE3 root-dom0 IOV PCIE4 BUS PCIE4 primary IOV PCIE5 BUS PCIE5 root-dom0 IOV PCIE6 BUS PCIE6 root-dom0 IOV PCIE7 BUS PCIE7 root-dom0 IOV PCIE8 BUS PCIE8 PCIE9 BUS PCIE9 PCIE10 BUS PCIE10 PCIE11 BUS PCIE11 PCIE12 BUS PCIE12 PCIE13 BUS PCIE13 PCIE14 BUS PCIE14 PCIE15 BUS PCIE15 (Omitted) |
- Manually reduce the number of CPU cores and the memory resources assigned to the logical domains.
The following explains the procedure for manually reducing the number of CPU cores and the memory resources assigned to the logical domains, instead of specifying the -m unbind=resource option, to satisfy the conditions for "Placement of CPU cores and memory" in "2.5.2 Considerations in System Operation for Dynamic Reconfiguration."
If, for example, the deleteboard command fails due to an error, it may be possible in some cases to execute the deleteboard command by following this procedure.
When specifying the -m unbind=resource option of the deleteboard command, go to step 9.
- a. Check the number of CPU cores and then delete them.The number of CPU cores that can be used will be reduced as a result of releasing the SPARC M10-4S, so first apply the following procedure to reduce the number of CPU cores assigned to the logical domains beforehand.
- i. Check the number of CPU cores after SPARC M10-4S has been released.From the XSCF, execute the showpparinfo command to check the number of CPU cores, excluding those of SPARC M10-4S to be released.In the following example, if a physical system board (PSB) number of 01-0 is to be released, the sum of the CPU cores with another PSB number of 00-0 must be calculated. Thus, 16 + 16 + 16 + 16 = 64 cores.
XSCF> showpparinfo -p 0 PPAR#00 Information: -------------------- CPU(s) : 8 CPU Cores : 128 CPU Threads : 256 Memory size (GB) : 256 CoD Assigned (Cores) : 256 CPU(s): ------- PID PSB CPU# Cores Threads 00 00-0 0 16 32 00 00-0 1 16 32 00 00-0 2 16 32 00 00-0 3 16 32 00 01-0 0 16 32 00 01-0 1 16 32 00 01-0 2 16 32 00 01-0 3 16 32 (Omitted) |
- ii. Check the total number of the CPU cores that are assigned to each logical domain.Execute ldm list-devices –a core. The number of rows with a value other than 100 in the %FREE column is the total number of CPU cores assigned to the logical domain.In the following example, the ldm list-devices –a core command is executed, and a check is performed using the -p option. As a result, it can be seen that 112 cores are bound to the entire logical domain.
# ldm list-devices -a core CORE ID %FREE CPUSET 0 0 (0, 1) 4 0 (8, 9) 8 0 (16, 17) 12 0 (24, 25) (Omitted) # ldm list-devices -a -p core | egrep -v "CORE|VERSION|free=100" | wc -l 112 |
- iii. Calculate the core shortfall that results from the release of the SPARC M10-4S.Using the formula below, calculate the CPU core shortfall that will result after releasing SPARC M10-4S.CPU core shortfall = Number of cores used in logical domain (step ii) - Number of physical cores after release (step i)For the example in steps i and ii, it is found that the shortfall will be 112 cores (in use) - 64 cores (remaining) = 48 cores.
- iv. Consider which logical domains to delete if a CPU core shortfall occurs.If it is found as a result of step iii that a CPU core shortfall will occur, it is necessary to delete the CPU cores from the logical domains.Execute the ldm list-domain command to check the number of CPU cores assigned to each of the logical domains in an active or bound state, and the check the logical domains from which the CPU cores will be deleted.In the following example, it is found that 32 cores (64vcpu) are assigned to primary, 32 cores (64vcpu) to guest0, 32 cores (64vcpu) to guest1, and 16 cores (32vcpu) to root-dom0. In the example, 48 cores need to be deleted. Thus, 16 cores will be deleted from each of primary, guest0, and guest1.
# ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- UART 64 56G 0.1% 18h 17m guest0 active -n---- 5100 64 64G 0.0% 15h 13m guest1 active -n---- 5101 64 64G 0.0% 15h 4m root-dom0 active -n--v- 5000 32 32G 0.0% 15h 47m root-dom1 inactive ------ 32 32G |
- v. Execute the ldm remove-core command to delete the CPU cores from the target logical domains.In the following example, 16 cores are deleted from each of primary, guest0, and guest1, and a check is made to determine if they have actually been deleted.
# ldm remove-core 16 primary # ldm remove-core 16 guest0 # ldm remove-core 16 guest1 # ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- UART 32 56G 0.0% 18h 19m guest0 active -n---- 5100 32 64G 0.0% 15h 15m guest1 active -n---- 5101 32 64G 0.0% 15h 5m root-dom0 active -n--v- 5000 32 32G 0.0% 15h 49m root-dom1 inactive ------ 5001 32 32G # ldm list-devices -a -p core | egrep -v "CORE|VERSION|free=100" | wc -l 64 |
- b. Check and delete memory resources.The memory area that can be used will be reduced as a result of releasing the SPARC M10-4S, so use the procedure below to delete the memory resources assigned to the logical domains.
- i. Check the use status of the continuous region of the memory (memory block).Execute the prtdiag command and the ldm list-devices -a memory command to check the memory block assigned to each logical domain, and check the SPARC M10-4Ss to which unassigned memory blocks are assigned.First, execute the prtdiag command to check the correspondence between the physical addresses in memory and the SPARC M10-4Ss.
# prtdiag (Omitted) ======================= Physical Memory Configuration ======================== Segment Table: -------------------------------------------------------------- Base Segment Interleave Bank Contains Address Size Factor Size Modules -------------------------------------------------------------- 0x7e0000000000 32 GB 4 8 GB /BB0/CMUL/CMP0/MEM00A (Omitted) 0x7c0000000000 32 GB 4 8 GB /BB0/CMUL/CMP1/MEM10A (Omitted) 0x7a0000000000 32 GB 4 8 GB /BB0/CMUU/CMP0/MEM00A (Omitted) 0x780000000000 32 GB 4 8 GB /BB0/CMUU/CMP1/MEM10A (Omitted) 0x760000000000 32 GB 4 8 GB /BB1/CMUL/CMP0/MEM00A (Omitted) 0x740000000000 32 GB 4 8 GB /BB1/CMUL/CMP1/MEM10A (Omitted) 0x720000000000 32 GB 4 8 GB /BB1/CMUU/CMP0/MEM00A (Omitted) 0x700000000000 32 GB 4 8 GB /BB1/CMUU/CMP1/MEM10A (Omitted) |
- The result of this example is rearranged in ascending order of physical addresses in memory. The following table lists the correspondence between the physical addresses and the SPARC M10-4S.
Base Address (Physical Address) | Building Block Configuration of SPARC M10-4S |
---|---|
0x700000000000 and after | BB1 |
0x720000000000 and after | BB1 |
0x740000000000 and after | BB1 |
0x760000000000 and after | BB1 |
0x780000000000 and after | BB0 |
0x7a0000000000 and after | BB0 |
0x7c0000000000 and after | BB0 |
0x7e0000000000 and after | BB0 |
- Then, execute the ldm list-devices -a memory command to check the continuous areas (called memory blocks in the remainder of this document) of memory assigned to the logical domains and any unassigned memory blocks.In the following example, the ldm list-devices –a memory command is executed.The meaning of each parameter is as follows.PA: Start physical address of the memory blockSIZE: Size of the memory blockBOUND: Name of the logical domain to which the memory block is assigned A blank is an unassigned area, and _sys_ is a control area that is not assigned to a logical domain.
# ldm list-devices -a memory MEMORY PA SIZE BOUND 0x700000000000 32G root-dom0 0x720000000000 32G 0x740000000000 32G guest0 0x760000800000 1272M _sys_ 0x760050000000 31488M guest1 0x780000000000 32G guest0 0x7a0000000000 32G guest1 0x7c0000000000 28G primary 0x7c0700000000 4G 0x7e0000800000 1272M _sys_ 0x7e0050000000 512M _sys_ 0x7e0070000000 256M _sys_ 0x7e0080000000 28G primary 0x7e0780000000 1280M guest1 0x7e07d0000000 768M |
- By combining the results shown above with the physical positions checked with the prtdiag command, it can be seen that the memory block usage status is as below.
SPARC M10-4S | Physical Address | Size | Logical Domain |
---|---|---|---|
BB1 (target for replacement) | 0x700000000000 | 32 GB |
root-dom0 |
0x720000000000 | 32 GB |
Unassigned | |
0x740000000000 | 32 GB |
guest0 | |
0x760050000000 | 31,488 MB | guest1 | |
BB0 | 0x780000000000 | 32 GB |
guest0 |
0x7a0000000000 | 32 GB |
guest1 | |
0x7c0000000000 | 28 GB |
primary | |
0x7c0700000000 | 4 GB |
Unassigned | |
0x7e0080000000 | 28 GB |
primary | |
0x7e0780000000 | 1,280 MB | guest1 | |
0x7e07d0000000 | 768 MB |
Unassigned |
- ii. Check the size and quantity of the movement source memory blocks.While referring to the check results of the memory block use status, check the memory block (hereafter called "source memory block") assigned to the SPARC M10-4S to be replaced.For "Table A-6 Example of Memory Block Use Statuses," memory blocks of 32 GB x 2 and 31,488 MB x 1 are being used in BB1 as the blocks assigned to the logical domain.
- iii. Determine the logical domains from which to delete memory, as well as the amounts.Then, check the locations of the memory blocks assigned to each logical domain, and make sure that the source memory blocks can be moved to unassigned memory blocks (called "destination memory blocks" in the remainder of this document) of the SPARC M10-4Ss that will not be released, by deleting the memory in memory block units and by reducing the memory block size.As a result, make a final decision as to how much memory to delete and from which logical domains to delete it.
- The following methods are supported.- Reduce the number of memory blocks that need to be moved by deleting, as a whole, the source memory blocks.- Increase the number of possible destinations by deleting, as a whole, the memory blocks assigned to the logical domains on the SPARC M10-4Ss that are not to be released.- Reduce the size of the source memory blocks so that they will fit into the available free area at the destinations.- Increase the number of free memory blocks at the destinations to make the movement possible, by reducing the size of the source memory blocks and by reducing the size of the memory blocks of the destinations that are in use.
Note - After the size is reduced, the free memory blocks will not be continuous (they will be fragmented). Even if multiple small memory blocks are deleted to increase the number of free areas, the free areas will not be continuous areas. If the continuous areas of the source memory blocks are large, movement will be impossible. In such a case, delete the source memory blocks to adjust their size. |
Note - When considering this, the possibility of continuous areas being fragmented after deletion can be reduced by selecting deletable ones that have the same sizes as the existing memory blocks whenever possible. This increases the possibility of the success of the movement of memory blocks. |
Note - If too much memory is deleted, this will place a burden on the memory in the logical domains, possibly causing problems such as Oracle Solaris hanging. It is necessary to be careful not to delete too much by using the vmstat command and checking the "free" size as a rough guide. |
- Investigate the deletion plan based on "Table A-6 Example of Memory Block Use Statuses."As a result of this consideration, make a plan to delete 4 GB from root-dom0, 32 GB from guest0, 31,488 MB from guest1, and 28 GB from the primary as shown in "Table A-7 Memory Block Deletion Plan" as an example.
SPARC M10-4S | Size | Logical Domain | Deletion Plan |
---|---|---|---|
BB1 (target for replacement) | 32 GB |
root-dom0 | Reduce this area by 4 GB to 28 GB, delete 28 GB from the primary on BB0, and then implement the movement. |
32 GB |
Unassigned | - | |
32 GB |
guest0 | Delete this because there is a 32-GB guest0 memory on BB0. | |
31,488 MB | guest1 | Delete this because there is a 32-GB guest1 memory on BB0. | |
BB0 | 32 GB |
guest0 | Leave it. |
32 GB |
guest1 | Leave it. | |
28 GB |
primary | Leave it. | |
4 GB |
Unassigned | - | |
28 GB |
primary | Delete it to move 28 GB of root-dom0. | |
1,280 MB | guest1 | Leave it. | |
768 MB |
Unassigned | - |
- iv. Manually delete memory from the logical domains.Delete memory from the logical domains, using the ldm remove-memory command, according to the memory deletion plan devised in step iii.The following example indicates the execution of the command for deleting memory according to "Table A-7 Memory Block Deletion Plan."
# ldm remove-memory 4G root-dom0 # ldm remove-memory 32G guest0 # ldm remove-memory 31488M guest1 # ldm remove-memory 28G primary |
- v. Check the states of the deleted memory blocks.Execute the ldm list-devices -a memory command to check if the layout is such that movement is possible, by referring to the results of the deletion. If there are any memory blocks that cannot be moved, consider which additional memory blocks to delete based on the results, and then delete them.In the following example, it is easier to check if movement is possible by comparing, side by side, the large sizes assigned to BB1 with the large sizes in the free areas on BB0.
# ldm list-devices -a memory MEMORY PA SIZE BOUND (BB1) 0x700000000000 256M root-dom0 →Can be moved by dividing a 4-GB destination 0x700010000000 4G 0x700110000000 28416M root-dom0 →Can be moved to 0x780000000000 (28GB) 0x720000000000 32G 0x740000000000 256M guest0 →Can be moved by dividing a 4-GB destination 0x740010000000 4G 0x740110000000 28416M guest0 →Must be deleted again because there is no destination(*) 0x760000800000 1272M _sys_ 0x760050000000 256M guest1 →Can be moved by dividing a 4-GB area 0x760060000000 1792M 0x7600d0000000 29440M guest1 →Can be moved to 0x7a0000000000 (29GB) (BB0) 0x780000000000 28G ←Can be moved from rootdom0 (0x700110000000) 0x780700000000 4G guest0 0x7a0000000000 29G ←Can be moved from guest1 (0x7600d0000000) 0x7a0740000000 3G guest1 0x7c0000000000 256M primary 0x7c0010000000 4G ←Can be moved from 256 MB of root-dom0, guest0, or guest1 0x7c0110000000 24320M primary 0x7c0700000000 4G 0x7e0000800000 1272M _sys_ 0x7e0050000000 512M _sys_ 0x7e0070000000 256M _sys_ 0x7e0080000000 24G ←Not sufficient for guest0 (0x740110000000) to move(*) 0x7e0680000000 4G primary 0x7e0780000000 1280M guest1 0x7e07d0000000 768M |
- In the above example, the destination marked with (*) has only 24 GB (24,576 MB) of free space, so delete 3,840 MB from the 28416-MB area (guest0) on the source (BB1), and then repeat the check. In the following example, it can be seen that all of the memory blocks can now be moved.
# ldm remove-memory 3840M guest0 # ldm list-devices -a memory MEMORY PA SIZE BOUND (BB1) 0x700000000000 256M root-dom0 0x700010000000 4G 0x700110000000 28416M root-dom0 0x720000000000 32G 0x740000000000 256M guest0 0x740010000000 7936M 0x740200000000 24G guest0 →Can be moved to 0x7e0080000000 (24G) 0x760000800000 1272M _sys_ 0x760050000000 256M guest1 0x760060000000 1792M 0x7600d0000000 29440M guest1 (BB0) 0x780000000000 28G 0x780700000000 4G guest0 0x7a0000000000 29G 0x7a0740000000 3G guest1 0x7c0000000000 256M primary 0x7c0010000000 4G 0x7c0110000000 24320M primary 0x7c0700000000 4G 0x7e0000800000 1272M _sys_ 0x7e0050000000 512M _sys_ 0x7e0070000000 256M _sys_ 0x7e0080000000 24G ←Can be moved from guest0 (0x740200000000) 0x7e0680000000 4G primary 0x7e0780000000 1280M guest1 0x7e07d0000000 768M |
- Release the system board (PSB<BB>) of the SPARC M10-4S from the physical partition.
- a. Execute the deleteboard -c disconnect command to release the PSB from the physical partition.
XSCF> deleteboard -c disconnect 01-0 PSB#01-0 will be unconfigured from PPAR immediately. Continue?[y|n] :y Start unconfigure preparation of PSB. [1200sec] 0end Unconfigure preparation of PSB has completed. Start unconfiguring PSB from PPAR. [7200sec] 0..... 30..... 60....end Unconfigured PSB from PPAR. PSB power off sequence started. [1200sec] 0..... 30..... 60..... 90.....120.....150.....end Operation has completed. |
- b. Execute the showresult command to check the exit status of the deleteboard command that was just executed.An end value of 0 indicates the normal termination of the deleteboard command.If the end value is other than 0 or if an error message is displayed upon executing the deleteboard command, it indicates abnormal termination of the deleteboard command. By referring to "C.1.2 deleteboard" based on the error message, identify the error and then take corrective action.
XSCF> showresult 0 |
- c. Execute the showboards command to check the PSB status.Confirm that the PSB in the SPARC M10-4S to be replaced is in the “Assigned" state and that the [Pwr], [Conn], and [Conf] columns all show "n."
XSCF> showboards -p 0 PSB PPAR-ID(LSB) Assignment Pwr Conn Conf Test Fault ---- ------------ ----------- ---- ---- ---- ------- -------- 00-0 00(00) Assigned y y y Passed Normal 01-0 00(01) Assigned n n n Passed Normal |
- Execute the replacefru command to replace the SPARC M10-4S.
XSCF> replacefru |
Note - For details on the replacement of SPARC M10-4Ss by using the replacefru command, see "5.8 Releasing a SPARC M10-4/M10-4S FRU from the System with the replacefru Command" and "6.2 Incorporating a SPARC M10-4/M10-4S FRU into the System with the replacefru Command" in the Fujitsu M10-4/Fujitsu M10-4S/SPARC M10-4/SPARC M10-4S Service Manual. |
- Incorporate the PSB into the physical partition.
- a. Execute the showboards command to check the PSB status.Confirm that the PSB in the SPARC M10-4S to be replaced is in the Assigned state and that the [Pwr], [Conn], and [Conf] columns all show "n."
XSCF> showboards -p 0 PSB PPAR-ID(LSB) Assignment Pwr Conn Conf Test Fault ---- ------------ ----------- ---- ---- ---- ------- -------- 00-0 00(00) Assigned y y y Passed Normal 01-0 00(01) Assigned n n n Passed Normal |
- b. Execute the addboard -c configure command to incorporate the PSB into the physical partition.To recover the original logical domain configuration, execute the addboard -c configure command with the -m bind=resource option specified.
XSCF> addboard -c configure -m bind=resource -p 0 01-0 PSB#01-0 will be configured into PPAR-ID 0. Continue?[y|n] :y Start connecting PSB to PPAR. [3600sec] 0..... 30..... 60..... 90.....120.....150.....180.....210.....240..... 270.....300.....330.....360.....390.....420.....450.....480.....510..... 540.....570.....600.....630.....660.....690.....720.....750.....780..... 810.....840.....870.....900.....930.....960.....end Connected PSB to PPAR. Start configuring PSB to Logical Domains (LDoms) Manager. [1800sec] 0.....end Configured PSB to Logical Domains (LDoms) Manager. Operation has completed. |
Note - If an error message appears during execution of the addboard command, see "C.1.1 addboard," and then identify the error and take corrective action. |
- c. Execute the showresult command to check the exit status of the addboard command that was just executed.An end value of 0 indicates the normal termination of the addboard command.If the end value is other than 0 or if an error message is displayed upon executing the addboard command, it indicates abnormal termination of the addboard command. By referring to "C.1.1 addboard" based on the error message, identify the error and then take corrective action.
XSCF> showresult 0 |
- d. Execute the showboards command to check the PSB status.Confirm that both of the [Conn] and [Conf] columns show "y" after the PSB in the replaced SPARC M10-4S has been successfully incorporated.
XSCF> showboards -p 0 PSB PPAR-ID(LSB) Assignment Pwr Conn Conf Test Fault ---- ------------ ----------- ---- ---- ---- ------- -------- 00-0 00(00) Assigned y y y Passed Normal 01-0 00(01) Assigned y y y Passed Normal |
- Check the logical domain operation status.
- a. Execute the console command to connect to the console of the control domain and then log in to it.
XSCF> console -p 0 |
- b. Execute the ldm list-domain command to confirm that the logical domain operation status has not changed after the addition of the SPARC M10-4S PSB (BB).To check the logical domain operation status, check the [STATE] and [FLAGS] combination. If [STATE] indicates "active", the second character from the left of the string in [FLAGS] has the following meaning."n": Oracle Solaris is operating"t": OpenBoot PROM status"-": In another state (including [STATE] other than "active")
# ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- UART 32 28G 64% 2h 54m guest0 active -n---- 5100 32 61876M 42% 2h 54m guest1 active -n---- 5101 32 62388M 11% 2h 54m root-dom0 active -n--v- 5000 32 28G 0.0% 2h 54m root-dom1 inactive ------ 32 32G 0.0% 2h 54m |
- Restore the deleted resources.
If the resources assigned to the logical domains have been deleted with the deleteboard command, add the resources with the ldm add-core and ldm add-memory commands to restore them.
In the following example, deleted CPU cores and memory are added to restore the resources before the replacement of the SPARC M10-4S.
# ldm add-core 16 primary # ldm add-core 16 guest0 # ldm add-core 16 guest1 # ldm add-memory 28G primary # ldm add-memory 4G root-dom0 # ldm add-memory 36608M guest0 # ldm add-memory 31488M guest1 # ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- UART 64 56G 0.2% 4h 59m guest0 active -n---- 5100 64 64G 0.0% 1h 55m guest1 active -n---- 5101 64 64G 0.0% 1h 46m root-dom0 active -n--v- 5000 32 32G 0.0% 2h 29m root-dom1 inactive ------ 32 32G |
- Restart the use of the I/O devices.
- a. Reassign root complexes.Execute the ldm bind-domain and ldm start-domain commands to start the root domain in the unbind state to which root complexes in the replaced SPARC M10-4S were assigned.The following example starts the root domain (root-dom1) in the unbind state, and confirms that it has started.
# ldm bind-domain root-dom1 # ldm start-domain root-dom1 LDom root-dom1 started # ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- UART 64 56G 0.2% 3h 8m guest0 active -n---- 5100 64 64G 0.0% 3h 8m guest1 active -n---- 5101 64 64G 0.0% 3h 8m root-dom0 active -n--v- 5000 32 32G 0.0% 3h 8m root-dom1 active -n--v- 5001 32 32G 7.3% 8s |
- Execute the ldm list-io command to confirm that the physical I/O devices are assigned to the root domain that has just started.
# ldm list-io NAME TYPE BUS DOMAIN STATUS ---- ---- --- ------ ------ PCIE0 BUS PCIE0 primary IOV PCIE1 BUS PCIE1 root-dom0 IOV PCIE2 BUS PCIE2 root-dom0 IOV PCIE3 BUS PCIE3 root-dom0 IOV PCIE4 BUS PCIE4 primary IOV PCIE5 BUS PCIE5 root-dom0 IOV PCIE6 BUS PCIE6 root-dom0 IOV PCIE7 BUS PCIE7 root-dom0 IOV PCIE8 BUS PCIE8 PCIE9 BUS PCIE9 root-dom1 IOV PCIE10 BUS PCIE10 root-dom1 IOV PCIE11 BUS PCIE11 root-dom1 IOV PCIE12 BUS PCIE12 PCIE13 BUS PCIE13 root-dom1 IOV PCIE14 BUS PCIE14 root-dom1 IOV PCIE15 BUS PCIE15 root-dom1 IOV .... /BB1/CMUL/NET0 PCIE PCIE8 UNK /BB1/CMUL/SASHBA PCIE PCIE8 UNK /BB1/PCI0 PCIE PCIE9 root-dom1OCC /BB1/PCI3 PCIE PCIE10 root-dom1OCC /BB1/PCI4 PCIE PCIE10 root-dom1OCC /BB1/PCI7 PCIE PCIE11 root-dom1OCC /BB1/PCI8 PCIE PCIE11 root-dom1OCC /BB1/CMUL/NET2 PCIE PCIE12 UNK /BB1/PCI1 PCIE PCIE13 root-dom1OCC /BB1/PCI2 PCIE PCIE13 root-dom1OCC /BB1/PCI5 PCIE PCIE14 root-dom1OCC /BB1/PCI6 PCIE PCIE14 root-dom1OCC /BB1/PCI9 PCIE PCIE15 root-dom1OCC /BB1/PCI10 PCIE PCIE15 root-dom1OCC |
- b. Add the virtual I/O device from the root domain to the guest domain.Execute the ldm add-vdisk and ldm add-vnet commands to add, to each guest domain, the virtual disk (vdisk) and virtual network device (vnet) supported for the virtual I/O service of the started root domain.The following example indicates the execution of the command for adding the virtual disk (vdisk10) and virtual network device (vnet10) that use the virtual I/O service of BB#01 root domain (root-dom1).
# ldm add-vdisk id=1 vdisk10 vol10@vds1 guest0 # ldm add-vnet id=1 vnet10 vsw10 guest0 |
- Perform the same addition for the guest domain (guest1).
Note - To add the virtual I/O device again, it is necessary to specify the ID assigned beforehand. You can check the ID from the result of execution of the ldm list -l command in the status used before the virtual I/O device is deleted. |
- c. Incorporate the virtual I/O devices assigned to a guest domain into the redundant configuration.Once the root domain (root-dom1) to which root complexes in BB#1 have been assigned has started, the virtual I/O device services corresponding to each guest domain are also started.Log in to each guest domain, and then incorporate the virtual I/O devices from root-dom1 that were previously canceled into the redundant configuration. For details on how to use the redundant configuration software, see the documentation for the software for that redundant configuration.
- The following describes an example of incorporating a virtual network device (vnet1) into the IPMP configuration. For details on the commands, see the manual for Oracle Solaris.First, log in to the guest domain (guest0).
# ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv- UART 64 56G 0.0% 4h 17m guest0 active -n---- 5100 64 64G 0.0% 1h 13m guest1 active -n---- 5101 64 64G 0.0% 1h 4m root-dom0 active -n--v- 5000 32 32G 0.0% 1h 47m root-dom1 active -n--v- 5001 32 32G 0.0% 1h 19m # telnet localhost 5100 .... guest0# |
- Execute the dladm show-phys command to check the mapping between the virtual network interface (vnet1) and the network interface name (net1).
guest0# dladm show-phys LINK MEDIA STATE SPEED DUPLEX DEVICE net0 Ethernet up 0 unknown vnet0 net1 Ethernet up 0 unknown vnet1 |
- Execute the ipadm create-ip command, the ipadm set-ifprop command, and the ipadm add-ipmp command to register net1 as a standby device of ipmp0.
guest0# ipadm create-ip net1 guest0# ipadm set-ifprop -p standby=on -m ip net1 guest0# ipadm add-ipmp -i net1 ipmp0 |
- Execute the ipmpstat -i command to confirm that STATE of the network interface configuring IPMP indicates ok.
guest0# ipmpstat -i INTERFACE ACTIVE GROUP FLAGS LINK PROBE STATE net0 yes ipmp0 -smbM-- up disabled ok net1 no ipmp0 -s---d- up disabled ok |
- Perform the same step for the other guest domain (guest1).
- Restore the system volume and I/O devices on the control domain to a redundant configuration.
- a. Add the root complex configuration for the control domain.Add the root complexes in BB#01 that were previously removed from the control domain.
- Execute the ldm list-io command to check the unassigned root complexes.The following example shows that root complexes with BB1 devices, PCIE8 and PCIE12 are not assigned.
# ldm list-io NAME TYPE BUS DOMAIN STATUS ---- ---- --- ------ ------ PCIE0 BUS PCIE0 primary IOV PCIE1 BUS PCIE1 root-dom0 IOV PCIE2 BUS PCIE2 root-dom0 IOV PCIE3 BUS PCIE3 root-dom0 IOV PCIE4 BUS PCIE4 primary IOV PCIE5 BUS PCIE5 root-dom0 IOV PCIE6 BUS PCIE6 root-dom0 IOV PCIE7 BUS PCIE7 root-dom0 IOV PCIE8 BUS PCIE8 PCIE9 BUS PCIE9 root-dom1 IOV PCIE10 BUS PCIE10 root-dom1 IOV PCIE11 BUS PCIE11 root-dom1 IOV PCIE12 BUS PCIE12 PCIE13 BUS PCIE13 root-dom1 IOV (Omitted) |
- Execute the ldm add-io command to add PCIE8 and PCIE12 to primary.
# ldm add-io PCIE8 primary # ldm add-io PCIE12 primary |
- Execute the ldm list-io command to confirm that the root complexes in BB#01 have been added to the control domain.
# ldm list-io | grep primary PCIE0 BUS PCIE0 primary IOV PCIE4 BUS PCIE4 primary IOV PCIE8 BUS PCIE8 primary IOV PCIE12 BUS PCIE12 primary IOV /BB1/CMUL/NET0 PCIE PCIE0 primary OCC /BB1/CMUL/SASHBA PCIE PCIE0 primary OCC /BB1/CMUL/NET2 PCIE PCIE4 primary OCC |
- b. Place the system volume in the control domain in a redundant configuration.Execute the zpool status command in the control domain to check the mirroring configuration status.The following example describes how to configure the ZFS mirroring function for the system volume in the control domain.
# zpool status rpool pool: rpool state: ONLINE scan: resilvered 29.1M in 0h0m with 0 errors on Thu Jan 23 17:27:59 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c2t50000393E802CCE2d0s0 ONLINE 0 0 0 errors: No known data errors |
- Execute the zpool attach command to incorporate the disks into a mirroring configuration.
# zpool attach rpool c2t50000393E802CCE2d0s0 c3t50000393A803B13Ed0s0 Make sure to wait until resilver is done before rebooting. # |
- Execute the zpool status command, and then confirm that the mirroring configuration has been established.Use the zpool status command to confirm whether synchronization processing (resilver) is completed.The following shows an example of the display during synchronization processing.
# zpool status rpool pool: rpool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function in a degraded state. action: Wait for the resilver to complete. Run 'zpool status -v' to see device specific details. scan: resilver in progress since Mon Jan 27 15:55:47 2014 21.1G scanned out of 70.6G at 120M/s, 0h7m to go 21.0G resilvered, 29.84% done config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c2t50000393E802CCE2d0s0 ONLINE 0 0 0 c3t50000393A803B13Ed0s0 DEGRADED 0 0 0 (resilvering) errors: No known data errors |
- Once synchronization processing is complete, the displayed screen will be as follows:
# zpool status rpool pool: rpool state: ONLINE scan: resilvered 70.6G in 0h9m with 0 errors on Mon Jan 27 16:05:34 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t50000393E802CCE2d0s0 ONLINE 0 0 0 c3t50000393A803B13Ed0s0 ONLINE 0 0 0 errors: No known data errors |
- If you are using other devices in BB#01, establish a redundant configuration or resume the use of the devices. For details on how to establish a redundant configuration or resume the use of devices, see the documentation for the software for that redundant configuration and Oracle Solaris.
< Previous Page | Next Page >