深刻理解SPDK之五 SPDK问题排查A篇

现象

运行SPDK程序,出现下面的错误:app

starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status

致使程序没法执行。什么缘由呢?less

分析过程

NVME Hardware queue 的使用禁忌

参考NVME 协议,能够看到hardware queue 是由一个submittion queue 和
completion queue 组成,两种配合才能处理IO请求:ide

深刻理解SPDK之五  SPDK问题排查A篇

参考协议中的说明:函数

When host software builds a command for the controller to execute, it first checks to make sure that the appropriate Submission Queue (SQx) is not full. The Submission Queue is full when the number of entries in the queue is one less than the queue size. Once an empty slot (pFreeSlot) is available:
1. Host software builds a command at SQx[pFreeSlot] with:
a. CDW0.OPC is set to the appropriate command to be executed by the controller;
b. CDW0.FUSE is set to the appropriate value, depending on whether the command is a
fused operation;
c. CDW0.CID is set to a unique identifier for the command when combined with the
Submission Queue identifier;
d. The Namespace Identifier, CDW1.NSID, is set to the namespace the command applies to;
e. MPTR shall be filled in with the offset to the beginning of the Metadata Region, if there is a data transfer and the namespace format contains metadata as a separate buffer;
f. PRP1 and/or PRP2 (or SGL Entry 1 if SGLs are used) are set to the source/destination of data transfer, if there is a data transfer; and
g. CDW10 – CDW15 are set to any command specific information;
and
2. Host software writes the corresponding Submission Queue doorbell register (SQxTDBL)
to submit one or more commands for processing.
The write to the Submission Queue doorbell register triggers the controller to consume one or more new commands contained in the Submission Queue entry. The controller indicates the most recent SQ entry that has been consumed as part of reporting completions. Host software may use this information to determine when SQ slots may be re-used for new commands.

能够看到上面三、四、五、6步都是由NVME 控制器硬件完成的,而1/2 7/8 都由host 侧的软件完成,其中一、2有严格前后顺序的限制,7/8也有严格前后顺序的限制。ui

SPDK默认绑核方式

基于上面处理流程,SPDK提供了封装上面步骤一、二、七、8的API,做为一个函数使用。若是多个线程同时调用上面的API去控制同一组hard ware queue,就可能致使打破上面的操做顺序的限制。所以,在初始化的时候,SPDK线程会默认绑定到某个处理器核上去。this

@@ -448,7 +448,7 @@ int init(const char * dev_name) {
     spdk_env_opts_init(&opts);
     opts.name = "append_demo";
     opts.shm_id = 0;
     opts.core_mask = "0x8";
     if (spdk_env_init(&opts) < 0) {
         fprintf(stderr, "Unable to initialize Spdk env\n");
         return -1;

SPDK线程注意事项

经过上面的分析能够看到:一组HW queue pair 不能同时给多个线程使用,但不一样hard ware queue 分别被不一样线程同时使用。spa

验证结果

根据上面的分析,修改了程序,错误一会儿没有了。线程

现象

ERROR: requested 256 hugepages but only 2 could be allocated.
Memory might be heavily fragmented. Please try flushing the system cache, or reboot the machine.code

[root@036db0018 scripts]# free -m
total used free shared buffers cached
Mem: 128332 124880 3451 0 2665 102940
-/+ buffers/cache: 19275 109056
Swap: 0 0 0orm

[root@036db0018 scripts]# echo 3 > /proc/sys/vm/drop_caches

[root@036db0018 scripts]#
[root@036db0018.bdbl.baidu.com scripts]# free -m
total used free shared buffers cached
Mem: 128332 14382 113949 0 129 797
-/+ buffers/cache: 13455 114876
Swap: 0 0 0

再次执行下面的命令:[root@bdbl-inf-bce036db0018 scripts]# NRHUGE=256 ./single_setup_b0.sh config看到没有报错。

相关文章
相关标签/搜索