ITPub博客

首页 > Linux操作系统 > Linux操作系统 > hds 多路径软件failover,failback测试

hds 多路径软件failover,failback测试

原创 Linux操作系统 作者:yangzhangyue 时间:2013-08-07 13:25:47 0 删除 编辑
Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONEhds failover,failback测试

终端1

[16:26:24 root@localhost modprobe.d]# dd if=/dev/zero f=/dev/sddlmaa1

233408833+0 records in

233408833+0 records out

119505322496 bytes (120 GB) copied, 701.094 s, 170 MB/s

 

 

终端2

[16:31:49 root@localhost bin]# ./dlnkmgr view -path

Paths:000002 OnlinePaths:000002

PathStatus   IO-Count    IO-Errors

Online       235218946   0        

 

PathID PathName                        DskName                                    iLU              ChaPort Status     Type IO-Count   IO-Errors  DNum HDevName

000000 0007.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Online     Own    75625355          0    0 sddlmaa

000001 0008.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Online     Own   159593591          0    0 sddlmaa

KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:32:32

 

终端3

Linux 2.6.32-220.el6.x86_64 (localhost.localdomain)     08/05/2013      _x86_64_        (8 CPU)

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.09    0.01    2.26    0.42    0.00   97.22

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               1.55        67.96         8.32     476638      58346

sdb             194.79         1.24     10897.81       8728   76431871

sdc             233.01         1.06     22918.89       7416  160741878

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.87    0.00   22.01    4.24    0.00   72.88

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.20         0.00         2.40          0         24

sdb            1020.40         0.00    206470.20          0    2064702

sdc             966.10         0.00    138000.00          0    1380000

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.90    0.00   21.15    4.78    0.00   73.17

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb             978.30         0.00    149055.20          0    1490552

sdc            1011.50         0.00    189052.40          0    1890524

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.90    0.00   21.14    4.08    0.00   73.88

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               2.20       156.00         1.60       1560         16

sdb            1033.80         0.00    152442.40          0    1524424

sdc            1070.40         0.00    190097.60          0    1900976

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.84    0.00   22.66    3.86    0.00   72.65

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               1.30        17.58         4.00        176         40

sdb             233.97         0.00     46331.27          0     463776

sdc             680.22         0.00    296334.77          0    2966311

 

终端2

关闭一个光纤卡

[16:32:32 root@localhost bin]# ./dlnkmgr offline -hba 0007.0000

KAPL01055-I All the paths which pass the specified HBA will be changed to the Offline(C) status. Is this OK? [y/n]:y

KAPL01056-I If you are sure that there would be no problem when all the paths which pass the specified HBA are placed in the Offline(C) status, enter y. Otherwise, enter n. [y/n]:y

KAPL01061-I 1 path(s) were successfully placed Offline(C); 0 path(s) were not. Operation name = offline

[16:33:10 root@localhost bin]# ./dlnkmgr view -path

Paths:000002 OnlinePaths:000001

PathStatus   IO-Count    IO-Errors

Reduced      252808586   0        

 

PathID PathName                        DskName                                    iLU              ChaPort Status     Type IO-Count   IO-Errors  DNum HDevName

000000 0007.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Offline(C) Own    81976455          0    0 sddlmaa

000001 0008.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Online     Own   170832131          0    0 sddlmaa

KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:33:24

 

终端3:

查看iostat情况,可以发现sdb流量为0dsc Blk_wrtn 3421184增加了尽一倍

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.81    0.00   23.19    2.30    0.00   73.70

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.20         0.00         2.40          0         24

sdb               0.00         0.00         0.00          0          0

sdc             334.00         0.00    342118.40          0    3421184

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.84    0.00   23.48    2.26    0.00   73.42

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.20         0.00         2.40          0         24

sdb               0.00         0.00         0.00          0          0

sdc             335.60         0.00    343552.00          0    3435520

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.83    0.00   23.18    2.39    0.00   73.60

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.60         0.00         4.80          0         48

sdb               0.00         0.00         0.00          0          0

sdc             335.70         0.00    343859.20          0    3438592

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.84    0.00   23.33    2.47    0.00   73.36

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb               0.00         0.00         0.00          0          0

sdc             334.60         0.00    342630.40          0    3426304

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.86    0.00   23.03    2.32    0.00   73.80

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb               0.00         0.00         0.00          0          0

sdc             334.80         0.00    342835.20          0    3428352

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.80    0.00   22.68    3.51    0.00   73.01

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb               0.00         0.00         0.00          0          0

sdc             335.10         0.00    343040.00          0    3430400

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.87    0.00   22.52    3.20    0.00   73.41

          

终端2

将关闭的光纤卡置为online

[16:33:24 root@localhost bin]# ./dlnkmgr online -hba 0007.0000

KAPL01057-I All the paths which pass the specified HBA will be changed to the Online status. Is this OK? [y/n]:y

KAPL01061-I 1 path(s) were successfully placed Online; 0 path(s) were not. Operation name = online

[16:34:20 root@localhost bin]# ./dlnkmgr view -path

Paths:000002 OnlinePaths:000002

PathStatus   IO-Count    IO-Errors

Online       272845735   0        

 

PathID PathName                        DskName                                    iLU              ChaPort Status     Type IO-Count   IO-Errors  DNum HDevName

000000 0007.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Online     Own    82274955          0    0 sddlmaa

000001 0008.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Online     Own   190570780          0    0 sddlmaa

KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:34:22

 

终端3

再看看io的情况,io负载分散到sdbsdc上面

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               3.40       239.20         7.20       2392         72

sdb             801.90         0.00    118380.00          0    1183800

sdc             922.40         0.00    224718.50          0    2247185

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.81    0.00   21.67    4.00    0.00   73.52

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb            1127.90         0.00    170607.40          0    1706074

sdc            1105.10         0.00    147145.60          0    1471456

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.86    0.00   22.77    2.70    0.00   73.68

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb            1952.10         0.00    176125.40          0    1761254

sdc            1992.30         0.00    184086.40          0    1840864

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.73    0.00   23.05    3.05    0.00   73.17

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.60         0.00         4.80          0         48

sdb            2100.40         0.00    174668.40          0    1746684

sdc            2152.80         0.00    176666.80          0    1766668

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.88    0.00   22.60    3.07    0.00   73.45

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               3.40        48.00        11.20        480        112

sdb            1108.10         0.00    155167.60          0    1551676

sdc            1196.50         0.00    188496.00          0    1884960

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.84    0.00   23.66    2.62    0.00   72.88

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.00         0.00         0.00          0          0

sdb            1174.20         0.00    185929.40          0    1859294

sdc            1074.30         0.00    155898.00          0    1558980

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.88    0.00   23.17    2.48    0.00   73.47

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.20         0.00         2.40          0         24

sdb            1189.70         0.00    185251.80          0    1852518

sdc            1100.80         0.00    157490.00          0    1574900

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.88    0.00   23.83    2.45    0.00   72.84

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               1.00         0.00       205.60          0       2056

sdb            1249.40         0.00    187183.10          0    1871831

sdc            1113.00         0.00    155370.00          0    1553700

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.78    0.00   22.82    3.03    0.00   73.38

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               1.10         0.00        10.40          0        104

sdb            1541.30         0.00    176036.70          0    1760367

sdc            1441.80         0.00    151576.90          0    1515769

 

手动切换是不受影响的

但是如果拔掉光纤卡,读写在check完成之前,还是有影响的

终端1

[16:39:45 root@localhost modprobe.d]# dd if=/dev/zero f=/dev/sddlmaa1

 

终端2

[16:48:05 root@localhost ~]# iostat 10 50 >iostat.log

 

拔掉一个光纤

终端3

[16:49:38 root@localhost bin]# ./dlnkmgr view -path

Paths:000002 OnlinePaths:000002

PathStatus   IO-Count    IO-Errors

Online       387300351   0        

 

PathID PathName                        DskName                                    iLU              ChaPort Status     Type IO-Count   IO-Errors  DNum HDevName

000000 0007.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Online     Own   137667240          0    0 sddlmaa

000001 0008.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Online     Own   249633111          0    0 sddlmaa

KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:49:38

[16:49:38 root@localhost bin]# ./dlnkmgr view -path

Paths:000002 OnlinePaths:000001

PathStatus   IO-Count    IO-Errors

Reduced      387337873   22185    

 

PathID PathName                        DskName                                    iLU              ChaPort Status     Type IO-Count   IO-Errors  DNum HDevName

000000 0007.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Online     Own   137704762          0    0 sddlmaa

000001 0008.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Offline(E) Own   249633111      22185    0 sddlmaa

KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:49:39

[16:49:39 root@localhost bin]# ./dlnkmgr view -path

Paths:000002 OnlinePaths:000001

PathStatus   IO-Count    IO-Errors

Reduced      387450196   24029    

 

PathID PathName                        DskName                                    iLU              ChaPort Status     Type IO-Count   IO-Errors  DNum HDevName

000000 0007.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Online     Own   137817085          0    0 sddlmaa

000001 0008.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Offline(E) Own   249633111      24029    0 sddlmaa

KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:49:40

 

过一段时间,多路径软件会检测到一个链路变为Offline(E)

查看iostat情况,大概经过40-50s时间,io流量将为0了,之后检测到一个链路是正常的,io才正常

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               1.00         0.00         8.80          0         88

sdb             836.10         0.00    114120.00          0    1141200

sdc             850.80         0.00    160197.20          0    1601972

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.00    0.00    0.07   25.38    0.00   74.55

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.50         0.80         6.40          8         64

sdb               0.00         0.00         0.00          0          0

sdc               0.00         0.00         0.00          0          0

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.00    0.00    0.09   24.59    0.00   75.32

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.30         0.00         3.20          0         32

sdb               0.00         0.00         0.00          0          0

sdc               0.00         0.00         0.00          0          0

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.01    0.00    0.11   25.46    0.00   74.41

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda              10.90       525.60         4.00       5256         40

sdb               0.00         0.00         0.00          0          0

sdc               0.00         0.00         0.00          0          0

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.16    0.00    0.25   24.48    0.00   75.11

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               1.30        51.20        11.20        512        112

sdb               0.00         0.00         0.00          0          0

sdc               0.00         0.00         0.00          0          0

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.93    0.00   16.15   11.85    0.00   71.07

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               5.90       163.20       210.40       1632       2104

sdb           19102.90         0.00    125551.50          0    1255515

 

插上光纤

[16:50:51 root@localhost bin]# ./dlnkmgr view -path

Paths:000002 OnlinePaths:000001

PathStatus   IO-Count    IO-Errors

Reduced      410617324   24029    

 

PathID PathName                        DskName                                    iLU              ChaPort Status     Type IO-Count   IO-Errors  DNum HDevName

000000 0007.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Online     Own   160984213          0    0 sddlmaa

000001 0008.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Offline(E) Own   249633111      24029    0 sddlmaa

KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:50:52

[16:50:52 root@localhost bin]# ./dlnkmgr view -path

Paths:000002 OnlinePaths:000002

PathStatus   IO-Count    IO-Errors

Online       415619590   24029    

 

PathID PathName                        DskName                                    iLU              ChaPort Status     Type IO-Count   IO-Errors  DNum HDevName

000000 0007.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Online     Own   164203113          0    0 sddlmaa

000001 0008.0000.0000000000000000.0000 HITACHI .DF600F          .85017915         0217             0A      Online     Own   251416477      24029    0 sddlmaa

KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:51:07

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               1.10         0.00        56.80          0        568

sdb            3234.50         0.00    381491.50          0    3814915

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.87    0.00   21.18   10.45    0.00   67.50

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               7.60       426.40        16.00       4264        160

sdb             335.70         0.00    343756.80          0    3437568

sdd               1.10         8.80         0.00         88          0

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.83    0.00   22.01   14.26    0.00   62.90

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               6.30       249.60       208.80       2496       2088

sdb             335.10         0.00    343142.40          0    3431424

sdd              11.70        95.70         0.00        957          0

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.80    0.00   22.89    7.45    0.00   68.87

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.60        10.40         5.60        104         56

sdb             336.00         0.00    344064.00          0    3440640

sdd              12.20        99.70         0.00        997          0

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.87    0.00   23.14    2.71    0.00   73.29

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               2.90       219.20         8.80       2192         88

sdb             335.20         0.00    343347.20          0    3433472

sdd               0.00         0.00         0.00          0          0

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.84    0.00   21.66    4.12    0.00   73.38

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.70         0.00         9.60          0         96

sdb             976.40         0.00    188648.40          0    1886484

sdd             993.40         0.00    153716.60          0    1537166

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           0.78    0.00   22.12    3.73    0.00   73.37

 

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn

sda               0.30         0.00         4.00          0         40

sdb            1269.00         0.00    163058.60          0    1630586

sdd            1428.80         0.00    180634.50          0    1806345

 

由上面的内容看,io并没有收到影响,io又回复到负载均衡状态

 

从上面看,failover是需要时间的,对于一些要求比较高的应用,比如如果数据库负载比较高,这都是比较危险的,这与我们潜意思中双光纤卡冗余,如果其中一条坏掉,正常的那条链路是正常工作的。

hds专业解释:

Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE

HDLM默认的负载均衡方式是RR轮询,例如主机IOABCDEFGH…..写下来,如果分在两条路径上,则路径一传ACEG……,路径二传BDFH…….,存储控制器在从两条路径收到数据后,再组合成ABCDEFGH,按顺序写到磁盘上。因为每个HBA卡的端口都有IO排队,即有队列深度可调。所以主机的IO会事先分配到两个HBA卡端口排队,如果路径一突然中断了,则主机会HOLD住所有的IO,将原先排队在路径一上等待传输的ACEG与路径二上的BDFH重新按序组合成ABCDEFGH,并重新排队到路径二上,再通过路径二发送到存储端。

 

所以中间无IO的时间,就是主机重新对HBA上的待发送IO的重新排序时间。


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29033984/viewspace-767948/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2013-07-09

  • 博文量
    36
  • 访问量
    219746