GlusterFS数据存储脑裂修复方案最全解析

时间 2020-06-08

标签 glusterfs 数据存储修复方案解析繁體版

原文原文链接

本文档介绍了glusterfs中可用于监视复制卷状态的heal info命令以及解决脑裂的方法node

一. 概念解析

常见术语linux

名称	解释
Brick	GlusterFS 的基本存储单元，由可信存储池中服务器上对外输出的目录表示。存储目录的格式由服务器和目录的绝对路径构成 `SERVER:EXPORT`
Volume	一个卷，在逻辑上由N个bricks组成
Fuse	Unix-like OS上的可动态加载的模块，容许用户不用修改内核便可建立本身的文件系统
Glusterd	Gluster management daemon，glusterfs 后台进程，运行在全部Glusterfs 节点上
CLI	Command LineInterface 控制台，命令行界面
AFR	Automatic FileReplication 自动文件复制
GFID	glusterfs内部文件标识符，是一个uuid，每一个文件惟一
ReplicateVolume	副本卷
Client	客户端，挂载服务端的存储
Server	存储节点服务器，存储数据的位置

1.1 什么是脑裂

脑裂是指文件的两个或多个复制副本内容出现差别的状况。当文件处于脑裂状态时，副本的brick之间文件的数据或元数据不一致，此时尽管全部brick都存在，却没有足够的信息来权威地选择一个原始副本并修复不良的副本。对于目录，还存在一个条目脑裂，其中内部的文件在副本的各个brick中具备不一样的gfid文件类型；当Gluster AFR没法肯定复制集中哪一个副本是正确时，此时将会产生脑裂。bash

1.2 脑裂类型

数据脑裂：文件中的数据在副本集中的brick上不一样;
元数据脑裂：brick上的元数据不一样;
条目裂脑：当文件在每一个副本对上具备不一样的gfid时，会发生这种状况；此时是不能自动治愈的。

1.3 查看脑裂信息

gluster volume heal <VOLNAME> info

此命令将会列出全部须要修复的文件（并由self-heal守护进程处理）。执行之后将会输出文件路径或者GFID。服务器

heal info命令原理概述tcp

当调用此命令时，将生成一个glfsheal进程，该进程将读取/<brick-path>/.glusterfs/indices/下的各个子目录中（它能够链接到的）全部brick条目;这些条目是须要修复文件的gfid;一旦从一个brick中得到GFID条目，就根据该文件在副本集和trusted.afr.*扩展属性的每一个brick上进行查找，肯定文件是否须要修复，是否处于脑裂或其余状态。ide

命令输出示例工具

[root@gfs ~]# gluster volume heal test info
Brick \<hostname:brickpath-b1>
<gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2> - Is in split-brain
<gfid:39f301ae-4038-48c2-a889-7dac143e82dd> - Is in split-brain
<gfid:c3c94de2-232d-4083-b534-5da17fc476ac> - Is in split-brain
<gfid:6dc78b20-7eb6-49a3-8edb-087b90142246>

Number of entries: 4

Brick <hostname:brickpath-b2>
/dir/file2
/dir/file1 - Is in split-brain
/dir - Is in split-brain
/dir/file3
/file4 - Is in split-brain
/dir/a

Number of entries: 6

命令输出解释

此命令输出中列出的全部文件都须要修复；列出的文件将会附带如下标记：测试

1）Is in split-brainui

数据或元数据脑裂的文件将在其路径/GFID后面附加ls in split-brain，例如，对/file4文件的输出；可是，对于GFID脑裂中的文件，文件的父目录显示为脑裂，文件自己显示为须要修复，例如，上面的/dir为文件/dir/a的GFID脑裂。脑裂中的文件若是不解决脑裂问题就没法自愈。命令行

2） Is possibly undergoing heal

运行heal info命令时，将会锁定存储中的每一个文件，以查找是否须要修复。可是，若是自我修复守护程序已经开始修复文件，则它将不会被glfsheal锁定。在这种状况下，它将会输出此消息。另外一个可能的状况是多个glfsheal进程同时运行（例如，多个用户同时运行heal info命令）并争夺相同的锁。

示例

咱们使用两块brick b1和b2在复制卷test上；关闭self heal守护程序，挂载点为/mnt。

# gluster volume heal test info
Brick \<hostname:brickpath-b1>
<gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2> - Is in split-brain
<gfid:39f301ae-4038-48c2-a889-7dac143e82dd> - Is in split-brain
<gfid:c3c94de2-232d-4083-b534-5da17fc476ac> - Is in split-brain
<gfid:6dc78b20-7eb6-49a3-8edb-087b90142246>

Number of entries: 4

Brick <hostname:brickpath-b2>
/dir/file2
/dir/file1 - Is in split-brain
/dir - Is in split-brain
/dir/file3
/file4 - Is in split-brain
/dir/a

Number of entries: 6

输出结果分析

brick b1，有四项须要修复：

1）gfid为6dc78b20-7eb6-49a3-8edb-087b90142246的文件须要修复
2）aaca219f-0e25-4576-8689-3bfd93ca70c2，
39f301ae-4038-48c2-a889-7dac143e82dd和c3c94de2-232d-4083-b534-5da17fc476ac 处于脑裂状态

brick b2，有六项须要修复:

1）a、file2和file3须要修复
2）file1、file4和/dir处于脑裂状态

二. 修复脑裂

命令语句

gluster volume heal <VOLNAME> info split-brain

输出结果示例

# gluster volume heal test info split-brain
Brick <hostname:brickpath-b1>
<gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2>
<gfid:39f301ae-4038-48c2-a889-7dac143e82dd>
<gfid:c3c94de2-232d-4083-b534-5da17fc476ac>
Number of entries in split-brain: 3

Brick <hostname:brickpath-b2>
/dir/file1
/dir
/file4
Number of entries in split-brain: 3

注意，heal info命令，对于GFID split brains（相同的文件名但不一样的GFID）
他们的父目录处于脑裂中状态。

2.1 使用gluster命令行工具解决脑裂问题

一旦肯定了脑裂中的文件，就可使用多种策略从gluster命令行完成其修复。此方法不支持Entry/GFID脑裂修复；可使用如下策略来修复数据和元数据脑裂：

2.1.1 选择较大的文件做为源文件

此命令对于已知/肯定要将较大的文件视为源文件的文件修复很是有用。

gluster volume heal <VOLNAME> split-brain bigger-file <FILE>

在这里，<FILE>能够是从卷的根目录中看到的完整文件名（也能够是文件的GFID字符串），一旦执行此命令，将会使用最大的<FILE>副本，并以该brick做为源完成修复。

示例：

在修复文件以前，需注意文件大小和md5校验和：

在brick b1:

[brick1]# stat b1/dir/file1
  File: ‘b1/dir/file1’
  Size: 17              Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919362      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 13:55:40.149897333 +0530
Modify: 2015-03-06 13:55:37.206880347 +0530
Change: 2015-03-06 13:55:37.206880347 +0530
 Birth: -
[brick1]#
[brick1]# md5sum b1/dir/file1
040751929ceabf77c3c0b3b662f341a8  b1/dir/file1

在brick b2:

[brick2]# stat b2/dir/file1
  File: ‘b2/dir/file1’
  Size: 13              Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919365      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 13:54:22.974451898 +0530
Modify: 2015-03-06 13:52:22.910758923 +0530
Change: 2015-03-06 13:52:22.910758923 +0530
 Birth: -
[brick2]#
[brick2]# md5sum b2/dir/file1
cb11635a45d45668a403145059c2a0d5  b2/dir/file1

使用如下命令修复file1:

gluster volume heal test split-brain bigger-file /dir/file1

修复完成后，两个brick上的md5校验和和文件大小应该相同。

在brick b1查看：

[brick1]# stat b1/dir/file1
  File: ‘b1/dir/file1’
  Size: 17              Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919362      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 14:17:27.752429505 +0530
Modify: 2015-03-06 13:55:37.206880347 +0530
Change: 2015-03-06 14:17:12.880343950 +0530
 Birth: -
[brick1]#
[brick1]# md5sum b1/dir/file1
040751929ceabf77c3c0b3b662f341a8  b1/dir/file1

在brick b2查看：

[brick2]# stat b2/dir/file1
  File: ‘b2/dir/file1’
  Size: 17              Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919365      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 14:17:23.249403600 +0530
Modify: 2015-03-06 13:55:37.206880000 +0530
Change: 2015-03-06 14:17:12.881343955 +0530
 Birth: -
[brick2]#
[brick2]# md5sum b2/dir/file1
040751929ceabf77c3c0b3b662f341a8  b2/dir/file1

2.1.2 选择以最新修改时间为源的文件

命令语句

gluster volume heal <VOLNAME> split-brain latest-mtime <FILE>

该命令使用对<FILE>具备最新修改时间的brick做为修复源。

2.1.3 选择副本中的一个brick做为特定文件的源

命令语句

gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE>

在这里，<HOSTNAME:BRICKNAME>被选择为源brick，使用存在于源brick中的文件做为修复源。

示例：

注意在修复先后的md5校验和和文件大小。

修复前

在brick b1:

[brick1]# stat b1/file4
  File: ‘b1/file4’
  Size: 4               Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919356      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 13:53:19.417085062 +0530
Modify: 2015-03-06 13:53:19.426085114 +0530
Change: 2015-03-06 13:53:19.426085114 +0530
 Birth: -
[brick1]#
[brick1]# md5sum b1/file4
b6273b589df2dfdbd8fe35b1011e3183  b1/file4

在brick b2:

[brick2]# stat b2/file4
  File: ‘b2/file4’
  Size: 4               Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919358      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 13:52:35.761833096 +0530
Modify: 2015-03-06 13:52:35.769833142 +0530
Change: 2015-03-06 13:52:35.769833142 +0530
 Birth: -
[brick2]#
[brick2]# md5sum b2/file4
0bee89b07a248e27c83fc3d5951213c1  b2/file4

使用下述命令修复带有gfid c3c94de2-232d-4083-b534-5da17fc476ac的文件:

gluster volume heal test split-brain source-brick test-host:/test/b1 gfid:c3c94de2-232d-4083-b534-5da17fc476ac

修复后：

在brick b1查看:

# stat b1/file4
  File: ‘b1/file4’
  Size: 4               Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919356      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 14:23:38.944609863 +0530
Modify: 2015-03-06 13:53:19.426085114 +0530
Change: 2015-03-06 14:27:15.058927962 +0530
 Birth: -
# md5sum b1/file4
b6273b589df2dfdbd8fe35b1011e3183  b1/file4

在brick b2查看:

# stat b2/file4
 File: ‘b2/file4’
  Size: 4               Blocks: 16         IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 919358      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-06 14:23:38.944609000 +0530
Modify: 2015-03-06 13:53:19.426085000 +0530
Change: 2015-03-06 14:27:15.059927968 +0530
 Birth: -
# md5sum b2/file4
b6273b589df2dfdbd8fe35b1011e3183  b2/file4

2.1.4 选择一个brick做为全部文件的源

场景：许多文件都处于脑裂状态，使用一个brick做为源

命令语句

gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME>

上述命令的结果是，选择<HOSTNAME:BRICKNAME>中的全部脑裂文件做为源文件并将其修复到集群中。

示例：

一个卷中有三个文件a，b和c发生脑裂。

# gluster volume heal test split-brain source-brick test-host:/test/b1
Healed gfid:944b4764-c253-4f02-b35f-0d0ae2f86c0f.
Healed gfid:3256d814-961c-4e6e-8df2-3a3143269ced.
Healed gfid:b23dd8de-af03-4006-a803-96d8bc0df004.
Number of healed entries: 3

如上所述，此方法不支持Entry/GFID脑裂修复不支持使用CLI修复脑裂。修复/dir将失败，由于它在entry split-brain。

# gluster volume heal test split-brain source-brick test-host:/test/b1 /dir
Healing /dir failed:Operation not permitted.
Volume heal failed.

可是此种问题能够经过从该brick以外的全部brick中删除文件来修复。参见下文修复目录脑裂。

2.2 从客户端修复脑裂

使用getfattr和setfattr命令，检测文件的数据和元数据脑裂状态，并从客户端修复脑裂。

使用具备brick b0，b1，b2和b3的test卷进行测试。

# gluster volume info test

Volume Name: test
Type: Distributed-Replicate
Volume ID: 00161935-de9e-4b80-a643-b36693183b61
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: test-host:/test/b0
Brick2: test-host:/test/b1
Brick3: test-host:/test/b2
Brick4: test-host:/test/b3

brick的目录结构以下：

# tree -R /test/b?
/test/b0
├── dir
│   └── a
└── file100

/test/b1
├── dir
│   └── a
└── file100

/test/b2
├── dir
├── file1
├── file2
└── file99

/test/b3
├── dir
├── file1
├── file2
└── file99

查看处于脑裂状态的文件

# gluster v heal test info split-brain
Brick test-host:/test/b0/
/file100
/dir
Number of entries in split-brain: 2

Brick test-host:/test/b1/
/file100
/dir
Number of entries in split-brain: 2

Brick test-host:/test/b2/
/file99
<gfid:5399a8d1-aee9-4653-bb7f-606df02b3696>
Number of entries in split-brain: 2

Brick test-host:/test/b3/
<gfid:05c4b283-af58-48ed-999e-4d706c7b97d5>
<gfid:5399a8d1-aee9-4653-bb7f-606df02b3696>
Number of entries in split-brain: 2

能够经过如下命令查看文件的数据/元数据脑裂状态

getfattr -n replica.split-brain-status <path-to-file>

若是文件位于数据/元数据脑裂中，则从客户端执行的上述命令可提供一些信息；还提供了要分析的信息，以得到有关该文件的更多信息。此命令不适用于gfid目录脑裂。

示例：

1） file100元数据脑裂。

# getfattr -n replica.split-brain-status file100
file: file100
replica.split-brain-status="data-split-brain:no    metadata-split-brain:yes    Choices:test-client-0,test-client-1"

2） file1数据脑裂。

# getfattr -n replica.split-brain-status file1
file: file1
replica.split-brain-status="data-split-brain:yes    metadata-split-brain:no    Choices:test-client-2,test-client-3"

3） file99数据和元数据同时脑裂。

# getfattr -n replica.split-brain-status file99
file: file99
replica.split-brain-status="data-split-brain:yes    metadata-split-brain:yes    Choices:test-client-2,test-client-3"

4） dir是目录脑裂，但如前所述，上述命令不适用于这种脑裂。

# getfattr -n replica.split-brain-status dir
file: dir
replica.split-brain-status="The file is not under data or metadata split-brain"

5） file2脑裂但不存在于任何卷中。

# getfattr -n replica.split-brain-status file2
file: file2
replica.split-brain-status="The file is not under data or metadata split-brain"

分析数据和元数据脑裂的文件

在客户端对脑裂中的文件执行操做（好比cat、getfatter等）会出现input/output error错误。为了可以分析这些文件，glusterfs提供了setfattr命令，能够在安装glusterfs后直接使用。

# setfattr -n replica.split-brain-choice -v "choiceX" <path-to-file>

使用这个命令，能够选择一个特定的brick来访问脑裂的文件。

示例：

1） “file1”脑裂。试图从文件中读取会出现input/output error错误。

# cat file1
cat: file1: Input/output error

file1在test-client-2和test-client-3上发生脑裂。

将test-client-2设置为file1的split brain choice，能够从b2读取文件。

# setfattr -n replica.split-brain-choice -v test-client-2 file1

对文件执行读取操做。

# cat file1
xyz

一样，要从其余客户端查看文件，replica.split-brain-choice设置为test-client-3。

从错误的选择中检查文件会出错

要撤消已设置的脑裂选择，能够将上述setfattr命令与none一块儿用做扩展属性的值。

示例：

# setfattr -n replica.split-brain-choice -v none file1

如今查看文件将再次出现Input/output error错误，如前所述。

# cat file
cat: file1: Input/output error

一旦肯定了使用的文件，就应该设置brick以进行修复。使用如下命令完成此操做：

# setfattr -n replica.split-brain-heal-finalize -v <heal-choice> <path-to-file>

示例

# setfattr -n replica.split-brain-heal-finalize -v test-client-2 file1

上述命令可用于修复全部文件上的数据和元数据脑裂。

注意:

1）若是禁用了fopen keep cachefuse挂载选项，则每次选择新副本以前都须要使inode无效。split-brain-choice检查文件。可使用以下命令：

# sefattr -n inode-invalidate -v 0 <path-to-file>

2）上面提到的从客户端修复脑裂的过程将没法在nfs客户端上运行，由于它不提供xattrs支持

2.3 自动修复脑裂

基于gluster命令行和客户端的修复方法须要手动修复，手动运行命令。cluster.favorite child policy卷选项，当设置为可用的策略之一时，它将自动修复脑裂，而无需用户干预；默认值为none，即禁用。

# gluster volume set help | grep -A3 cluster.favorite-child-policy
Option: cluster.favorite-child-policy
Default Value: none
Description: This option can be used to automatically resolve split-brains using various policies without user intervention. "size" picks the file with the biggest size as the source. "ctime" and "mtime" pick the file with the latest ctime and mtime respectively as the source. "majority" picks a file with identical mtime and size in more than half the number of bricks in the replica.

cluster.favorite child policy适用于该卷的全部文件。若是启用了此选项，则没必要在每一个文件脑裂时手动修复脑裂文件，而将会根据设置的策略自动修复脑裂。

2.4 最佳实践

1.获取脑裂文件的路径：

能够经过如下方法得到：

a）命令gluster volume heal info split-brain。
b）标识从客户端对其执行的文件操做始终失败并出现Input/Output error的文件。

2.从客户端关闭打开此文件的应用程序。虚拟机须要关闭电源。

3.肯定正确的副本：

经过使用getfattr命令获取和验证扩展属性的变动记录；而后经过扩展属性来肯定哪些brick包含可信的文件

getfattr -d -m . -e hex <file-path-on-brick>

有可能会出现一个brick包含正确的数据，而另外一个brick也包含正确的元数据

使用setfattr命令在包含文件数据/元数据的“不良副本”的brack上重置相关的扩展属性。

5.在客户端执行查找命令来触发文件的自我修复：

ls -l <file-path-on-gluster-mount>

步骤3至5的详细说明：

要了解如何解决脑裂，咱们须要了解changelog扩展属性。

getfattr -d -m . -e hex <file-path-on-brick>

示例：

[root@store3 ~]# getfattr -d -e hex -m. brick-a/file.txt
\#file: brick-a/file.txt
security.selinux=0x726f6f743a6f626a6563745f723a66696c655f743a733000
trusted.afr.vol-client-2=0x000000000000000000000000
trusted.afr.vol-client-3=0x000000000200000000000000
trusted.gfid=0x307a5c9efddd4e7c96e94fd4bcdcbd1b

trusted.afr.<volname>-client-<subvolume-index> Afr使用扩展属性来维护文件的变动日志；这个值由glusterfs客户端（fuse或nfs-server）进程计算；当glusterfs客户端修改文件或目录时，客户端联系每一个模块，并根据模块的响应更新changelog扩展属性。

示例：

[root@pranithk-laptop ~]# gluster volume info vol
 Volume Name: vol
 Type: Distributed-Replicate
 Volume ID: 4f2d7849-fbd6-40a2-b346-d13420978a01
 Status: Created
 Number of Bricks: 4 x 2 = 8
 Transport-type: tcp
 Bricks:
 brick-a: pranithk-laptop:/gfs/brick-a
 brick-b: pranithk-laptop:/gfs/brick-b
 brick-c: pranithk-laptop:/gfs/brick-c
 brick-d: pranithk-laptop:/gfs/brick-d
 brick-e: pranithk-laptop:/gfs/brick-e
 brick-f: pranithk-laptop:/gfs/brick-f
 brick-g: pranithk-laptop:/gfs/brick-g
 brick-h: pranithk-laptop:/gfs/brick-h

在上面的示例中：

Brick             |    Replica set        |    Brick subvolume index
----------------------------------------------------------------------------
-/gfs/brick-a     |       0               |       0
-/gfs/brick-b     |       0               |       1
-/gfs/brick-c     |       1               |       2
-/gfs/brick-d     |       1               |       3
-/gfs/brick-e     |       2               |       4
-/gfs/brick-f     |       2               |       5
-/gfs/brick-g     |       3               |       6
-/gfs/brick-h     |       3               |       7

brick中的每一个文件都维护本身的变动日志，副本集中全部其余brick中存在的文件的变动日志，如该brick所示。

在上面给出的示例卷中，brick-a中的全部文件都有两个条目，一个用于自身，另外一个用于副本卷中的文件，即brick-b：

trusted.afr.vol-client-0=0x000000000000000000000000-->自身的更改日志（brick-a）
brick-b的trusted.afr.vol-client-1=0x000000000000000000000000-->更改日志，如brick-a所示

一样，brick-b中的全部文件也将具备：
brick-a的trusted.afr.vol-client-0=0x000000000000000000000000-->更改日志，如brick-b所示
trusted.afr.vol-client-1=0x000000000000000000000000-->自身的更改日志（brick-b）

Changelog值解析

每一个扩展属性都有一个24位十六进制数字的值，前8位表明数据的变动日志，后8位表明变动日志
元数据的，最后8位数字表示目录项的更改日志。

0x 000003d7 00000001 00000000
        |      |       |
        |      |        \_ changelog of directory entries
        |       \_ changelog of metadata
         \ _ changelog of data

首8位字段记录数据变动记录
中间8位字段记录元数据变动记录
末8位字段记录索引gfid变动记录

当发生脑裂时，文件的更改日志将以下所示：

示例：（两份数据，元数据在同一个文件上脑裂对比）

[root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a
getfattr: Removing leading '/' from absolute path names
\#file: gfs/brick-a/a
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000003d70000000100000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
\#file: gfs/brick-b/a
trusted.afr.vol-client-0=0x000003b00000000100000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57

结果解析

文件/gfs/brick-a/a上的changelog扩展属性：

trusted.afr.vol-client-0的前8位都是零（0x00000000……………)
trusted.afr.vol-client-1的前8位并不是全为零（0x000003d7……………)
因此/gfs/brick-a/a上的changelog表示数据操做成功完成,但在/gfs/brick-b/a上失败了。

trusted.afr.vol-client-0的后8位全为零（x……..00000000…….）
trusted.afr.vol-client-1不是全为零（x……..0000000 1……）
所以/gfs/brick-a/a上的changelog表示数据操做成功完成,但在/gfs/brick-b/a上失败了。

文件/gfs/brick-b/a上的changelog扩展属性：

trusted.afr.vol-client-0的前8位并不是全为零（0x000003b0……………）
trusted.afr.vol-client-1的前8位都为零（0x00000000……………）
因此/gfs/brick-b/a上的changelog表示数据操做成功完成,但在/gfs/brick-a/a上失败了。

trusted.afr.vol-client-0的后8位不是全为零（x……..0000000 1…….）
trusted.afr.vol-client-1的后8位全为零（x……..00000000……）
因此/gfs/brick-b/a上的changelog表示数据操做成功完成,但在/gfs/brick-a/a上失败了。

因为两个副本都具备数据，元数据更改并未在两个副本同时生效，所以它既是数据脑裂又是元数据脑裂。

肯定正确的副本

使用stat，getfatter命令的输出来决定要保留的元数据和要决定要保留哪些数据的文件内容。

继续上面的例子，假设咱们想要保留/gfs/brick-a/a和/gfs/brick-b/a的元数据。

重置相关变动日志以解决脑裂：

解决数据脑裂：

更改文件的changelog扩展属性，某些数据在/gfs/brick-a/a上操做成功，但在/gfs/brick-b/a上操做失败，因此/gfs/brick-b/a不该包含任何更改日志，重置在/gfs/brick-b/a的trusted.afr.vol-client-0上更改日志的数据部分。

解决元数据脑裂：

更改文件的changelog扩展属性，某些数据在/gfs/brick-b/a上操做成功，但在/gfs/brick-a/a上失败，因此/gfs/brick-a/a不该包含任何更改日志，重置trusted.afr.vol-client-1更改日志的元数据部分。

完成上述操做后，更改日志将以下所示：

在 /gfs/brick-b/a查看:
trusted.afr.vol-client-0
0x000003b00000000100000000 to 0x000000000000000100000000

元数据部分仍然不是所有为零，执行setfattr-n trusted.afr.vol-client-0-v 0x00000000000000010000000/gfs/brick-b/a

在/gfs/brick-a/a查看：
trusted.afr.vol-client-1
0x000003d70000000100000000 to 0x000003d70000000000000000

数据部分仍然不是所有为零，执行setfattr-n trusted.afr.vol-client-1-v 0x000003d7000000000000000/gfs/brick-a/a

在完成上述操做以后，变动日志以下所示：

[root@pranithk-laptop vol]# getfattr -d -m . -e hex /gfs/brick-?/a
getfattr: Removing leading '/' from absolute path names
#file: gfs/brick-a/a
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000003d70000000000000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57

#file: gfs/brick-b/a
trusted.afr.vol-client-0=0x000000000000000100000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57

执行ls -l <file-path-on-gluster-mount>触发自愈

修复目录脑裂

当目录上出现脑裂时，AFR能够保守地合并目录中的不一样条目。若是在一个brick上的目录 storage 具备entry 1 ， 2 而在另外一个brick上具备entry 3 ， 4 则AFR将合并目录中的全部 1, 2, 3, 4 条目；以在同一目录中具备条目。可是，若是因为目录中文件的删除而致使脑裂的状况，则可能致使从新显示已删除的文件。当至少有一个条目具备相同文件名但 gfid 在该目录中不一样时，脑裂须要人工干预。例：

在 brick-a 目录上有2个条目， file1 带有 gfid_x 和 file2 。在 brick-b 目录中有2项 file1 与 gfid_y 和 file3 。这里的 file1 brick的gfid 有所不一样。这类目录脑裂须要人工干预才能解决此问题。必须删除 file1 on brick-a 或 file1 on brick-b 才能解决裂脑问题。

此外， gfid-link 必须删除相应的文件。这些 gfid-link 文件位于brick的顶级目录中。若是文件的gfid为 0x307a5c9efddd4e7c96e94fd4bcdcbd1b （ getfattr 先前从命令接收到的trust.gfid 扩展属性），则能够在找到gfid-link文件 /gfs/brick-a/.glusterfs/30/7a/307a5c9efddd4e7c96e94fd4bcdcbd1b 。

注意事项

删除gfid-link以前，必须确保在该Brick上没有指向该文件的硬连接，若是存在硬连接，则也必须删除它们。

本文转自GlusterFS官方文档