注意力机制＋人体姿态估计（human pose estimation）

时间 2019-12-14 标签注意力机制姿态估计 human pose estimation

注意力机制＋人体姿态估计（human pose estimation）

本文从***Multi-Context Attention for Human Pose Estimation (CVPR2017)*** 入手，说一下我的对这篇文章的理解。大神还请多多指点。html

注意力机制

首先说一下注意力机制，在我第一次看到注意力机制这个名词，我以为好高大上（没见过市面）。**注！意！力！**这也太牛Ｘ了吧，人工智能果真名不虚传。好长一段时间，都是停留在幻想这个多么多么厉害。直到看到这篇博客：Attention注意力机制–原理与应用
传送门：https://blog.csdn.net/joshuaxx316/article/details/70665388
（好好看完哟！好东西）git

Multi-Context Attention for Human Pose Estimation

而后说说上面的论文，Multi-Context Attention for Human Pose Estimation (CVPR2017)，这篇文章出自香港中文大学王晓刚团队，他们组在人体姿态估计这块能够说是执牛耳者。github

具体说说这篇文章，在17年刮起了一阵Attention的风，好多论文都融入了这个想法，这篇文章就是在这阵风下孕育而生的，只不过用在了人体姿态估计这个场景。
摘要中做者写到:web

we propose to incorporate convolutional neural networks with a multi-context attention mechanism
into an end-to-end framework for human pose estimation.dom

文中的多语义，多分辨率（Multi-Resolution），Hierarchical Attention Mechanism，这些在Hourglass Net (Stacked hourglass networks for human pose estimation. In ECCV, 2016.)中都有体现。我的以为本文最大亮点就是文中经过CRF(条件随机场)来实现holistic attention model。(本文还有其余亮点，但这篇博文重点关注Attention机制，以及如何经过CRF来实现这个注意力机制，固然文中的HRU也是一个不错的residual 模块的改进，有兴趣能够了解一下)
引用原文中的话:机器学习

We propose to use visual attention mechanism to automatically learn and infer the contextual representations,
driving the model to focus on region of interest. We tailor the attention scheme for human pose estimation by introducing CRFs to model the spatial correlations among neighborhood joints.svg

多说一句，CRF(条件随机场)是几率图模型中比较经典的一个模型，有兴趣的话能够看看周志华老师的西瓜书（机器学习）。学习

如摘要中所说：人工智能

The Conditional Random Field (CRF) is utilized to model the correlations among neighboring regions in the attention map.We further combine the holistic attention model, which focuses on the global consistency of the full human body, and the body part attention model,which focuses on detailed descriptions for different body parts.lua

重点来了！！！

Attention Mechanism
这在文中的第五部分，我在这里就不过多的讲解。想了解 Attention Mechanism在文中如何实现的同窗能够先看看上面的传送门：https://blog.csdn.net/joshuaxx316/article/details/70665388，看完以后有助于理解论文中如何具体实现，直接看论文也是云里雾里，结合代码也有帮助理解文章细节。
这篇论文的代码也是公开的github，使用的是Torch7，注意力机制主要的体如今AttentionPartsCRF.lua这个文件中。

local function AttentionIter(numIn, inp , lrnSize, itersize)
    local pad = math.floor(lrnSize/2)
    local U = nnlib.SpatialConvolution(numIn,1, 3, 3, 1,1,1,1)(inp)
    local spConv = nnlib.SpatialConvolution(1,1,lrnSize,lrnSize,1,1,pad,pad)
    -- need to share the parameters and the gradParameters as well
    local spConv_clone = spConv:clone('weight','bias','gradWeight','gradBias')

    local Q={}
    local C = {}
    for i=1,itersize do
        local conv 
        local Q_tmp

        if i==1 then
            conv = spConv(U)
        else
            conv = spConv_clone(Q[i-1])
        end
        table.insert(C,conv)
        Q_tmp = nn.Sigmoid()(nn.CAddTable(true)({C[i], U}))
        table.insert(Q,Q_tmp)
    end

    local pfeat = nn.CMulTable(){inp, nn.Replicate(numIn,   2){Q[itersize]}}
    return pfeat 
end

function AttentionPartsCRF(numIn, inp , lrnSize, itersize, usepart)
    if usepart == 0 then
        return AttentionIter(numIn, inp , lrnSize, itersize)
    else
        local partnum = outputDim[1][1]
        local pre = {}
        for i=1, partnum do
            local att = AttentionIter(numIn, inp , lrnSize, itersize)
            local s = nnlib.SpatialConvolution(numIn,1,1,1,1,1,0,0)(att)
            table.insert(pre,s)
        end
        return nn.JoinTable(2)(pre)
    end

end

结合文章和注意力机制详解，多看几边代码，应该能明白做者的思路。

还有一个总结***Multi-Context Attention for Human Pose Estimation***这篇文章的文档，写的比我强n倍，全是干货，分享给你们。点击文章跳转》》》Multi-Context Attention for Human Pose Estimation《《《