一、MCC概述

Clustered Metro Cluster(简称MCC)是Netapp Data Ontap提供的存储双活解决方案,当初的方案是把1个FAS/ V系列双控在数据中心之间拉远形成异地HA Pair,每站点只有单控制器节点,数据中心两站点之间通过额外的FC/VI集群适配器相连,数据中心间SAS磁盘框通过SAS转FC的FibreBridge相连在500米以内、同一个机房采用直接光纤通道交换机连接;在500米以上(最远100km)采用光纤通道和DWDM交换机相连。

640?wx_fmt=png&wxfrom=5&wx_lazy=1

 

0?wx_fmt=png

      MetroCluster在此架构上也进行了演变。通过在站点A、B两个站点分别放置两套FAS/ V双控阵列,阵列A的A控和阵列B的A控,阵列A的B控和阵列B的B控分别形成集群,这样可以充分把A、B站点数据中心资源充分利用,同时对外提供存储服务;但阵列内的A、B不是集群。如果站点间形成集群Pair的任意一个控制器节点故障,故障站点的主机都需要远程访问远端控制器节点;如何站点间形成集群Pair的两个节点同时故障,就会发生业务中断。

      Netapp Data Ontap8.3版本推出了4控双活解决方案,最远支持200公里距离,4控Metro Cluster方案首先由2个HA Pair组成2个本地集群,然后再从2个集群上做4节点集群。集群控制器之间内存日志通过存放在NVRAM里面,NVRAM对没有下盘的日志做了镜像,保证节点故障以后,HA Pair集群的Partner节点能够接管业务;或者站点故障以后,远端HA Pair集群能够接管业务。当日志到达一定水位或者发生系统操作刷盘时,下盘数据同步通过SyncMirror实现主从站点双写,从而确保一个站点磁盘故障以后,另外一个站点磁盘还能提供系统访问,实现站点故障切换,保证业务不中断。

0?wx_fmt=png

      MetroCluster使用两个不同地点的镜像和集群来保护数据,每个集群把数据和Storage Virtual Machine (SVM) 配置都镜像同步另一个集群。当某个站点发生灾难时,管理员可以激活远端SVM并在另一站点接管业务。此外,每个集群在本地节点均配置为HA Pair,从而提供了本地故障转移能力。

0?wx_fmt=png

      NetApp MetroCluster是以NetApp SyncMirror是配合Cluster_remote和控制器Cluster Failover的功能实现的。

      • Clustered Failover – 在主存储和容灾存储间提供高可用性失败恢复能力,故障接管的决策是由管理员通过单一命令行决定的。

      • SyncMirror – 为远端存储提供即时的数据拷贝,当故障接管时,数据可以仅通过远端的存储进行访问。

      • ClusterRemote – 提供管理机制用以判断灾难的发生并初始远端存储进行接管。

二、MCC巡检常用命令

1、系统健康状态检查

  1. cluster1::> system health status show
  2. Status
  3. ---------------
  4. ok

2、集群状态检查

  1. cluster1::> cluster show
  2. Node Health Eligibility
  3. --------------------- ------- ------------
  4. cluster1-01 true true
  5. cluster1-02 true true
  6. 2 entries were displayed.

3、集群统计状态检查

  1. cluster1::> cluster statistics show
  2. Counter Value Delta
  3. ---------------- ----------------- -------------
  4. CPU Busy: 0% -
  5. Operations:
  6. Total: 0 -
  7. NFS: 0 -
  8. CIFS: 0 -
  9. Data Network:
  10. Busy: 0% -
  11. Received: 5.78GB -
  12. Sent: 13.7GB -
  13. Cluster Network:
  14. Busy: 0% -
  15. Received: 967KB -
  16. Sent: 979KB -
  17. Storage Disk:
  18. Read: 6.38PB -
  19. Write: 6.26PB -

4、查看RAID组信息

  1. cluster1::> aggr show
  2. Aggregate Size Available Used% State #Vols Nodes RAID Status
  3. --------- -------- --------- ----- ------- ------ ---------------- ------------
  4. aggr0_A1 953.8GB 247.3GB 74% online 1 cluster1-01 raid4,
  5. mirrored,
  6. normal
  7. aggr0_A2 953.8GB 247.3GB 74% online 1 cluster1-02 raid4,
  8. mirrored,
  9. normal
  10. aggr_data_A1
  11. 68.93TB 16.04TB 77% online 32 cluster1-01 mixed_raid_
  12. type,
  13. mirrored,
  14. hybrid,
  15. normal
  16. aggr_data_A2
  17. 68.93TB 14.77TB 79% online 31 cluster1-02 mixed_raid_
  18. type,
  19. mirrored,
  20. hybrid,
  21. normal
  22. 4 entries were displayed.

5、查看节点信息

  1. cluster1::> node show
  2. Node Health Eligibility Uptime Model Owner Location
  3. --------- ------ ----------- ------------- ----------- -------- ---------------
  4. cluster1-01
  5. true true
  6. 369 days 19:12 FAS8040 gz_idc
  7. cluster1-02
  8. true true
  9. 369 days 19:23 FAS8040 gz_idc
  10. 2 entries were displayed.

6、查看版本信息

  1. cluster1::> version
  2. NetApp Release 8.3.2P9: Fri Jan 06 05:54:05 UTC 2017

7、查看序列号

  1. cluster1::> system license show
  2. Serial Number: 1-80-023992
  3. Owner: cluster1
  4. Package Type Description Expiration
  5. ----------------- ------- --------------------- --------------------
  6. Base license Cluster Base License -
  7. Serial Number: 1-81-0000000000000451515******
  8. Package Type Description Expiration
  9. ----------------- ------- --------------------- --------------------
  10. NFS license NFS License -
  11. iSCSI license iSCSI License -
  12. Serial Number: 1-81-0000000000000451515******
  13. Owner: cluster1-02
  14. Package Type Description Expiration
  15. ----------------- ------- --------------------- --------------------
  16. NFS license NFS License -
  17. iSCSI license iSCSI License -
  18. 5 entries were displayed.

8、查看子系统健康状态

  1. cluster1::> system health subsystem show
  2. Subsystem Health
  3. ----------------- ------------------
  4. SAS-connect ok
  5. Environment ok
  6. Memory ok
  7. Service-Processor ok
  8. Switch-Health ok
  9. CIFS-NDO ok
  10. Motherboard ok
  11. IO ok
  12. MetroCluster ok
  13. MetroCluster_Node ok
  14. FHM-Switch ok
  15. FHM-Bridge ok
  16. 12 entries were displayed.

9、查看MCC集群信息状态及节点信息状态

  1. cluster1::> metrocluster show
  2. Configuration: fabric
  3. Cluster Configuration State Mode
  4. ------------------------------ ---------------------- ------------------------
  5. Local: cluster1 configured normal
  6. Remote: cluster1_dr configured normal
  7. cluster1::> metrocluster node show
  8. DR Configuration DR
  9. Group Cluster Node State Mirroring Mode
  10. ----- ------- ------------------ -------------- --------- --------------------
  11. 1 cluster1
  12. cluster1-01 configured enabled normal
  13. cluster1-02 configured enabled normal
  14. cluster1_dr
  15. cluster1_dr-01 configured enabled normal
  16. cluster1_dr-02 configured enabled normal
  17. 4 entries were displayed.

10、查看控制器状态

  1. cluster1::> system controller show
  2. Controller Name System ID Serial Number Model Status
  3. ------------------------- ------------- ----------------- -------- -----------
  4. cluster1-01 536964819 451515****** FAS8040 ok
  5. cluster1-02 536961600 451515****** FAS8040 ok
  6. 2 entries were displayed.

11、查看故障硬盘

  1. cluster1::> storage disk show -broken
  2. There are no entries matching your query.

12、查看spare硬盘

  1. cluster1::> storage disk show -spare
  2. Original Owner: cluster1-01
  3. Checksum Compatibility: block
  4. Usable Physical
  5. Disk HA Shelf Bay Chan Pool Type RPM Size Size Owner
  6. --------------- ------------ ---- ------ ----- ------ -------- -------- --------
  7. 1.30.11 3a 30 11 A Pool0 SAS 10000 1.09TB 1.09TB cluster1-01
  8. 1.30.13 3a 30 13 A Pool0 SAS 10000 1.09TB 1.09TB cluster1-01
  9. 1.31.4 3a 31 4 A Pool0 SAS 10000 1.09TB 1.09TB cluster1-01
  10. 1.32.20 4b 32 20 B Pool0 SAS 10000 1.09TB 1.09TB cluster1-01
  11. 1.32.23 3a 32 23 A Pool0 SAS 10000 1.09TB 1.09TB cluster1-01
  12. 1.33.0 3a 33 0 A Pool0 SAS 10000 1.09TB 1.09TB cluster1-01
  13. 1.33.1 3a 33 1 A Pool0 SAS 10000 1.09TB 1.09TB cluster1-01
  14. 1.33.10 4b 33 10 B Pool0 SAS 10000 1.09TB 1.09TB cluster1-01
  15. 2.42.22 3a 42 22 A Pool1 SAS 10000 1.09TB 1.09TB cluster1-01
  16. 2.42.23 4b 42 23 B Pool1 SAS 10000 1.09TB 1.09TB cluster1-01
  17. 2.43.2 4b 43 2 B Pool1 SAS 10000 1.09TB 1.09TB cluster1-01
  18. 2.43.22 3b 43 22 A Pool1 SAS 10000 1.09TB 1.09TB cluster1-01
  19. 2.43.23 4b 43 23 B Pool1 SAS 10000 1.09TB 1.09TB cluster1-01
  20. 3.11.21 4b 11 21 B Pool0 SSD - 372.4GB 372.6GB cluster1-01
  21. 4.20.21 3a 20 21 A Pool1 SSD - 372.4GB 372.6GB cluster1-01
  22. 4.21.14 3a 21 14 A Pool1 SAS 10000 1.09TB 1.09TB cluster1-01
  23. Original Owner: cluster1-02
  24. Checksum Compatibility: block
  25. Usable Physical
  26. Disk HA Shelf Bay Chan Pool Type RPM Size Size Owner
  27. --------------- ------------ ---- ------ ----- ------ -------- -------- --------
  28. 2.44.23 3b 44 23 A Pool1 SAS 10000 1.09TB 1.09TB cluster1-02
  29. 3.12.21 4a 12 21 B Pool0 SSD - 372.4GB 372.6GB cluster1-02
  30. 4.23.21 3b 23 21 A Pool1 SSD - 372.4GB 372.6GB cluster1-02
  31. 5.60.23 3b 60 23 B Pool1 SAS 10000 1.09TB 1.09TB cluster1-02
  32. 20 entries were displayed.

13、查看SAS桥故障

  1. cluster1::> storage bridge show
  2. Is Monitor
  3. Bridge Symbolic Name Monitored Status Vendor Model Bridge WWN
  4. ------------------------ ------------- --------- ------- ------ --------------------- ----------------
  5. ATTO_10.0.15.17 BRIDGE_B_1
  6. true ok Atto FibreBridge 6500N 2000001086627bc0
  7. ATTO_10.0.15.18 BRIDGE_B_2
  8. true ok Atto FibreBridge 6500N 2000001086630f0e
  9. ATTO_10.0.15.19 BRIDGE_B_3
  10. true ok Atto FibreBridge 6500N 2000001086630edc
  11. ATTO_10.0.15.20 BRIDGE_B_4
  12. true ok Atto FibreBridge 6500N 2000001086630ed2
  13. ATTO_10.0.15.6 BRIDGE_A_1
  14. true ok Atto FibreBridge 6500N 2000001086630eb4
  15. ATTO_10.0.15.7 BRIDGE_A_2
  16. true ok Atto FibreBridge 6500N 2000001086630efa
  17. ATTO_10.0.15.8 BRIDGE_A_3
  18. true ok Atto FibreBridge 6500N 2000001086630f18
  19. ATTO_10.0.15.9 BRIDGE_A_4
  20. true ok Atto FibreBridge 6500N 2000001086630ef0
  21. ATTO_FibreBridge6500N_10 -
  22. false - Atto FibreBridge6500N 200000108663e514
  23. ATTO_FibreBridge6500N_11 -
  24. false - Atto FibreBridge6500N 200000108663e3f2
  25. ATTO_FibreBridge6500N_12 -
  26. false - Atto FibreBridge6500N 200000108663e488
  27. ATTO_FibreBridge6500N_13 -
  28. false - Atto FibreBridge6500N 20000010866114ec
  29. ATTO_FibreBridge6500N_14 -
  30. false - Atto FibreBridge6500N 2000001086627bc0
  31. ATTO_FibreBridge6500N_7 -
  32. false - Atto FibreBridge6500N 2000001086630e96
  33. ATTO_FibreBridge6500N_9 -
  34. false - Atto FibreBridge6500N 200000108663e4c4
  35. 15 entries were displayed.

14、查看纤交换机故障

  1. cluster1::> storage switch show
  2. Symbolic Is Monitor
  3. Switch Name Vendor Model Switch WWN Monitored Status
  4. --------------------- -------- ------- ----- ---------------- --------- -------
  5. Brocade_10.0.15.10
  6. SW_A_1
  7. Brocade Brocade6505
  8. 100050eb1a88327f true ok
  9. Brocade_10.0.15.11
  10. SW_A_2
  11. Brocade Brocade6505
  12. 100050eb1a881582 true ok
  13. Brocade_10.0.15.21
  14. SW_B_3
  15. Brocade Brocade6505
  16. 100050eb1a882f69 true ok
  17. Brocade_10.0.15.22
  18. SW_B_4
  19. Brocade Brocade6505
  20. 100050eb1a881522 true ok
  21. 4 entries were displayed.

15、查看failover状态

  1. cluster1::> storage failover show
  2. Takeover
  3. Node Partner Possible State Description
  4. -------------- -------------- -------- -------------------------------------
  5. cluster1-01 cluster1-02 true Connected to cluster1-02
  6. cluster1-02 cluster1-01 true Connected to cluster1-01
  7. 2 entries were displayed.

16、查看严重告警日志及错误告警日志

  1. cluster1::> event log show -severity critical
  2. There are no entries matching your query.
  3. cluster1::> event log show -severity error
  4. Time Node Severity Event
  5. ------------------- ---------------- ------------- ---------------------------
  6. 3/6/2018 02:28:30 cluster1-02 ERROR asup.post.drop: AutoSupport message (HA Group Notification from cluster1-02 (MANAGEMENT_LOG) INFO) for host (0) was not posted to NetApp. The system will drop the message.
  7. 3/6/2018 01:28:18 cluster1-02 ERROR asup.post.drop: AutoSupport message (HA Group Notification from cluster1-02 (PERFORMANCE DATA) INFO) for host (0) was not posted to NetApp. The system will drop the message.
  8. 3/6/2018 00:00:07 cluster1-02 ERROR mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) cluster1, Serial Number 5589765F, Certificate Authority 'cluster1' and type server for Vserver cluster1 has expired.
  9. 3/6/2018 00:00:07 cluster1-02 ERROR mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UC_SVM2, Serial Number 55A03966, Certificate Authority 'SVM2' and type server for Vserver SVM2 has expired.
  10. 3/6/2018 00:00:07 cluster1-02 ERROR mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UC_SVM, Serial Number 559FFD76, Certificate Authority 'SVM' and type server for Vserver SVM has expired.
  11. 3/6/2018 00:00:07 cluster1-02 ERROR mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UCS_SVM_DR, Serial Number 545845C16E278, Certificate Authority 'SVM_DR' and type server for Vserver SVM_DR-mc has expired.
  12. 3/6/2018 00:00:07 cluster1-02 ERROR mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UCS_SVM2_DR, Serial Number 545845A7B01FA, Certificate Authority 'SVM2_DR' and type server for Vserver SVM2_DR-mc has expired.
  13. 7 entries were displayed.

 17、查看某个聚合下的Volume状态信息
cluster1::> vol show -aggregate aggr_data_A1

 18、查看Lun信息及Lun详细信息

  1. cluster1::> lun show
  2. cluster1::> lun show -v

 19、查看map信息及map详情

  1. cluster1::> igroup show
  2. cluster1::> igroup show -v

 20、查看Lun的map情况

  1. cluster1::> lun show -m

21、进入某一节点

  1. cluster1::> run -node cluster1-01
  2. Type 'exit' or 'Ctrl-D' to return to the CLI
  3. cluster1-01>

 22、节点下查看spare disks

  1. cluster1-01> vol status -s
  2. Local spares
  3. Pool1 spare disks
  4. RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
  5. --------- ------ ------------- ---- ---- ---- ----- -------------- --------------
  6. Spare disks for block checksum
  7. spare SW_B_3:6.126L41 3a 21 14 FC:A 1 SAS 10000 1142352/2339537408 1144641/2344225968 (not zeroed)
  8. spare SW_B_3:7.126L75 3a 42 22 FC:A 1 SAS 10000 1142352/2339537408 1144641/2344225968
  9. spare SW_B_3:7.126L101 3b 43 22 FC:A 1 SAS 10000 1142352/2339537408 1144641/2344225968
  10. spare SW_B_4:7.126L76 4b 42 23 FC:B 1 SAS 10000 1142352/2339537408 1144641/2344225968
  11. spare SW_B_4:7.126L29 4b 43 2 FC:B 1 SAS 10000 1142352/2339537408 1144641/2344225968
  12. spare SW_B_4:7.126L50 4b 43 23 FC:B 1 SAS 10000 1142352/2339537408 1144641/2344225968
  13. spare SW_B_3:6.126L22 3a 20 21 FC:A 1 SSD N/A 381304/780910592 381554/781422768
  14. Pool0 spare disks
  15. RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
  16. --------- ------ ------------- ---- ---- ---- ----- -------------- --------------
  17. Spare disks for block checksum
  18. spare SW_A_1:7.126L12 3a 30 11 FC:A 0 SAS 10000 1142352/2339537408 1144641/2344225968
  19. spare SW_A_1:7.126L14 3a 30 13 FC:A 0 SAS 10000 1142352/2339537408 1144641/2344225968
  20. spare SW_A_1:7.126L31 3a 31 4 FC:A 0 SAS 10000 1142352/2339537408 1144641/2344225968
  21. spare SW_A_1:7.126L76 3a 32 23 FC:A 0 SAS 10000 1142352/2339537408 1144641/2344225968
  22. spare SW_A_1:7.126L79 3a 33 0 FC:A 0 SAS 10000 1142352/2339537408 1144641/2344225968
  23. spare SW_A_1:7.126L80 3a 33 1 FC:A 0 SAS 10000 1142352/2339537408 1144641/2344225968
  24. spare SW_A_2:7.126L73 4b 32 20 FC:B 0 SAS 10000 1142352/2339537408 1144641/2344225968
  25. spare SW_A_2:7.126L37 4b 33 10 FC:B 0 SAS 10000 1142352/2339537408 1144641/2344225968
  26. spare SW_A_2:6.126L74 4b 11 21 FC:B 0 SSD N/A 381304/780910592 381554/781422768

 23、节点下查看fail disk

  1. cluster1-01> vol status -f
  2. Broken disks (empty)

 24、显示没有ownership(归属权)的硬盘

  1. cluster1-01> disk show -n
  2. disk show : No unassigned disks

 25、分配硬盘的归属(硬盘更换常用)

  1. cluster1-01> disk assign all

  26、查看所有硬盘位置信息

  1. cluster1-01> storage show disk -p

 

版权声明:本文为cloudos原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://www.cnblogs.com/cloudos/p/8515574.html