Coursera | Andrew Ng (03-week1-1.10)—理解人的表现

该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂


转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」

知乎https://zhuanlan.zhihu.com/c_147249273

CSDNhttp://blog.csdn.net/junjun_zhao/article/details/79157927


1.10 Understanding human-level performence (理解人的表现 )

(字幕来源:网易云课堂)

这里写图片描述

The term human-level performance is sometimes used casually in research articles.But let me show you how we can define it a bit more precisely.And in particular, use the definition of the phrase, human-level performance,that is most useful for helping you drive progress in your machine learning project.So remember from our last video that one of the uses of this phrase,human-level error, is that it gives us a way of estimating Bayes error.What is the best possible error any function could,either now or in the future, ever, ever achieve?So bearing that in mind, let’s look at a medical image classification example.

人类水平表现这个词,在论文里经常随意使用,但我现在告诉你这个词更准确的定义,特别是 使用人类水平表现这个词的定义,可以帮助你们推动机器学习项目的进展,还记得上个视频中 我们用过这个词,人类水平错误率 用来估计贝叶斯错误那就是理论最低的错误率 任何函数,不管是现在还是将来能够到达的最低值,我们先记住这点 然后看看医学图像分类例子。

Let’s say that you want to look at a radiology image like this,and make a diagnosis classification decision.And suppose that a typical human, untrained human,achieves 3% error on this task.A typical doctor, maybe a typical radiologist doctor, achieves 1% error.An experienced doctor does even better, 0.7% error.And a team of experienced doctors, that is if you get a team of experienced doctors and have them all look at the image and discuss and debate the image,together their consensus opinion achieves 0.5% error.

这里写图片描述

假设你要观察这样的放射科图像,然后作出分类诊断,假设一个普通的人类 未经训练的人类,在此任务上达到 3%的错误率,普通的医生 也许是普通的放射科医生 能达到 1% 的错误率,经验丰富的医生做得更好 错误率为 0.7 %,还有一队经验丰富的医生 就是说如果你有一个经验丰富的医生团队,让他们都看看这个图像 然后讨论并辩论,他们达成共识的意见达到 0.5% 的错误率。

So the question I want to pose to you is, how should you define human-level error?Is human-level error 3%, 1%, 0.7% or 0.5%? Feel free to pause this video to think about it if you wish.And to answer that question, I would urge you to bear in mind that one of the most useful ways to think of human erroris as a proxy or an estimate for Bayes error.So please feel free to pause this video to think about it for a while if you wish.But here’s how I would define human-level error.Which is if you want a proxy or an estimate for Bayes error,then given that a team of experienced doctors discussing and debated can achieve 0.5% error,we know that Bayes error is less than equal to 0.5%.So because some system, team of these doctors can achieve 0.5% error,so by definition, this directly, optimal error has got to be 0.5% or lower.We don’t know how much better it is, maybe there’s a even larger teamof even more experienced doctors who could do even better,so maybe it’s even a little bit better than 0.5%.But we know the optimal error cannot be higher than 0.5%.So what I would do in this setting is use 0.5% as our estimate for Bayes error.So I would define human-level performance as 0.5%.At least if you’re hoping to use human-level error in the analysis of bias and variance as we saw in the last video.Now, for the purpose of publishing a research paper or for the purpose of deploying a system,maybe there’s a different definition of human-level error that you can use which is so long as you surpass the performance of a typical doctor.That seems like maybe a very useful result if ever accomplished,and maybe surpassing a single radiologist, a single doctor’s performance might mean the system is good enough to deploy in some context.So maybe the takeaway from this is to be clear about what your purpose isin defining the term human-level error.And if it is to show that you can surpass a single human and therefore argue for deploying your system in some context, maybe this is the appropriate definition.But if your goal is the proxy for Bayes error,then this is the appropriate definition.

这里写图片描述

所以我想问你的问题是 你应该如何界定人类水平错误率?人类水平错误率 3%, 1%, 0.7% 还是 0.5%?你也可以暂停视频思考一下,要回答这个问题 我想请你记住,思考人类水平错误率最有用的方式之一是,把它作为贝叶斯错误率的替代或估计,如果你愿意 也可以暂停视频 思考一下这个问题,但这里我就直接给出人类水平错误率的定义,就是如果你想要替代或估计贝叶斯错误率,那么一队经验丰富的医生讨论和辩论之后,可以达到 0.5% 的错误率,我们知道贝叶斯错误率小于等于 0.5%,因为有些系统 这些医生团队可以达到 0.5% 的错误率。所以根据定义 最优错误率必须在 0.5% 以下,我们不知道多少更好 也许有一个更大的团队,更有经验的医生能做得更好,所以也许比 0.5% 好一点,但是我们知道最优错误率不能高于 0.5%,那么在这个背景下 我就可以用 0.5%估计贝叶斯错误率,所以我将人类水平定义为 0.5%,至少如果你希望使用人类水平错误,来分析偏差和方差的时候 就像上个视频那样,现在 为了发表研究论文,或者部署系统,也许人类水平错误率的定义可以不一样,你可以使用..,只要你超越了一个普通医生的表现,如果能达到这种水平 那系统已经达到实用了,也许超过一名放射科医生 一名医生的表现,意味着系统在一些情况下可以有部署价值了,本视频的要点是 在定义人类水平错误率时,要弄清楚你的目标所在,如果要表明你可以超越单个人类 那么就有理由,在某些场合部署你的系统 也许这个定义是合适的。但是如果您的目标是替代贝叶斯错误率,那么这个定义才合适。

To see why this matters, let’s look at an error analysis example.Let’s say, for a medical imaging diagnosis example,that your training error is 5% and your dev error is 6%.And in the example from the previous slide, our human-level performance,and I’m going to think of this as proxy for Bayes error.Depending on whether you defined it as a typical doctor’s performance or experienced doctor or team of doctors,you would have either 1% or 0.7% or 0.5% for this.And remember also our definitions from the previous video,that this gap between Bayes error or estimate of Bayes error and training erroris calling that a measure of the avoidable bias.And this as a measure or an estimate of how much of a variance problem you have in your learning algorithm.

这里写图片描述

要了解为什么这个很重要 我们来看一个错误率分析的例子,比方说 在医学图像诊断例子中,你的训练错误率是 5% 你的开发错误率是 6%,而在上一张幻灯片的例子中 我们的人类水平表现,我将它看成是贝叶斯错误率的替代品,取决于你是否将它定义成普通单个医生的表现,还是有经验的医生或医生团队的表现,你可能会用 1% 或 0.7% 或 0.5%,同时也回想一下 前面视频中的定义,贝叶斯错误率或者说贝叶斯错误率的估计 和训练错误率直接的差值,就衡量了所谓的可避免偏差,这可以衡量或者估计你的学习算法的方差问题有多严重。

So in this first example, whichever of these choices you make,the measure of avoidable bias will be something like 4%.It will be somewhere between I guess,4%, if you take that to 4.5%, if you use 0.5%, whereas this is 1%.So in this example, I would say,it doesn’t really matter which of the definitions of human-level error you use,whether you use the typical doctor’s error or the single experienced doctor’s erroror the team of experienced doctor’s error.Whether this is 4% or 4.5%, this is clearly bigger than the variance problem.And so in this case,you should focus on bias reduction techniques such as train a bigger network.

这里写图片描述

所以在这个第一个例子中 无论你做出哪些选择,可避免偏差大概是 4%,这个值我想介于..,如果你取 1% 就是 4%,如果你取 0.5% 就是 4.5% 而这个差距是 1%,所以在这个例子中 我得说,不管你怎么定义人类水平错误率,使用单个普通医生的错误率定义,还是单个经验丰富医生的错误率定义,或经验丰富的医生团队的错误率定义,这是 4% 还是 4.5% 这明显比方差问题更大所以在这种情况下,你应该专注于减少偏差的技术 例如培训更大的网络。

Now let’s look at a second example.Let’s see your training error is 1% and your dev error is 5%.Then again it doesn’t really matter, seems but academicwhether the human-level performance is 1% or 0.7% or 0.5%.Because whichever of these definitions you use,your measure of avoidable bias will be,I guess somewhere between 0% if you use that, to 0.5%, right?That’s the gap between the human-level performance and your training error,whereas this gap is 4%.So this 4% is going to be much bigger than the avoidable bias either way.And so they’ll just suggest you should focus on variance reduction techniques such as regularization or getting a bigger training set.But where it really matters will be if your training error is 0.7%.So you’re doing really well now, and your dev error is 0.8%.In this case, it really matters that you use your estimate for Bayes error as 0.5%.Because in this case, your measure of how much avoidable bias you have is 0.2% which is twice as big as your measure for your variance, which is just 0.1%.And so this suggests that maybe both the bias and variance are both problems but maybe the avoidable bias is a bit bigger of a problem.And in this example, 0.5% as we discussed on the previous slidewas the best measure of Bayes error,because a team of human doctors could achieve that performance.

这里写图片描述

现在来看看第二个例子,比如说你的训练错误率是 1% 开发错误率是 5%,这其实也不怎么重要 这种问题更像学术界讨论的,人类水平表现是 1% 或 0.7% 还 是0.5%,因为不管你使用哪一个定义,你测量可避免偏差的方法是,如果用那个值 就是 0% 到 0.5% 之前 对吧,那就是人类水平和训练错误率之前的差距,而这个差距是 4%,所以这个 4% 差距比任何一种定义的可避免偏差都大,所以他们就建议 你应该主要使用减少方差的工具,比如正则化 或者去获取更大的训练集,这什么时候真正有效呢? 就是比如你的训练错误率是 0.7%,所以你现在已经做得很好了 你的开发错误率是 0.8%,在这种情况下 你用 0.5%来估计贝叶斯错误率关系就很大,因为在这种情况下 你测量到的可避免偏差是 0.2%,这是你测量到的方差问题 0.1% 的两倍,这表明也许偏差和方差都存在问题。但是,可避免偏差问题更严重,在这个例子中 我们在上一张幻灯片中讨论的是 0.5%,就是对贝叶斯错误率的最佳估计,因为一群人类医生可以实现这一目标。

If you use 0.7 as your proxy for Bayes error,you would have estimated avoidable bias as pretty much 0%,and you might have missed that.You actually should try to do better on your training set.So I hope this gives a sense also of why making progress in a machine learning problem gets harder as you achieve or as you approach human-level performance.In this example, once you’ve approached 0.7% error,unless you’re very careful about estimating Bayes error,you might not know how far away you are from Bayes error.And therefore how much you should be trying to reduce aviodable bias.In fact, if all you knew was that a single typical doctor achieves 1% error,and it might be very difficult to know if you should be trying to fit your training set even better.And this problem arose only when you’re doing very well on your problem already,only when you’re doing 0.7%, 0.8%, really close to human-level performance.Whereas in the two examples on the left,when you are further away from human-level performance,it was easier to target your focus on bias or variance.So this is maybe an illustration of why as your pro human-level performance isactually harder to tease out the bias and variance effects.And therefore why progress on your machine learning projectjust gets harder as you’re doing really well.

这里写图片描述

如果你用 0.7 代替贝叶斯错误率,你测得的可避免偏差基本上是 0%,那你就可能忽略可避免偏差了,实际上你应该试试能不能在训练集上做得更好,我希望讲这个能让你们有点概念,知道为什么机器学习问题上取得进展会越来越难,当你接近人类水平时进展会越来越难,在这个例子中 一旦你接近 0.7% 错误率,除非你非常小心估计贝叶斯错误率,你可能无法知道离贝叶斯错误率有多远,所以你应该尽量减少可避免偏差,事实上 如果你只知道单个普通医生能达到 1%错误率,这可能很难知道,是不是应该继续去拟合训练集,这种问题只会出现在,你的算法已经做得很好的时候,只有你已经做到 0.7%, 0.8%, 接近人类水平时会出现,而在左边的两个例子中,当你远离人类水平时,将优化目标放在偏差或方差上可能更容易一点,这就说明了 为什么当你们接近人类水平时,更难分辨出问题是偏差还是方差。所以机器学习项目的进展,在你已经做得很好的时候 很难更进一步。

So just to summarize what we’ve talked about.If you’re trying to understand bias and variancewhere you have an estimate of human-level errorfor a task that humans can do quite well,you can use human-level error as a proxy or as a approximation for Bayes error.And so the difference between your estimate of Bayes errortells you how much avoidable bias is a problem,how much avoidable bias there is.And the difference between training error and dev error,that tells you how much variance is a problem,whether your algorithm’s able to generalize from the training set to the dev set.And the big difference betweenour discussion here and what we saw in an earlier coursewas that instead of comparing training error to 0%,And just calling that the estimate of the bias.In contrast, in this video we have a more nuanced analysisin which there is no particular expectation that you should get 0% error.Because sometimes Bayes error is non zeroand sometimes it’s just not possiblefor anything to do better than a certain threshold of error.

这里写图片描述

总结一下我们讲到的,如果你想理解偏差和方差,那么在人类可以做得很好的任务中,你可以估计人类水平的错误率,你可以使用人类水平错误率来估计贝叶斯错误率,所以你到贝叶斯错误率估计值的差距,告诉你可避免偏差问题有多大,可避免偏差问题有多严重,而训练错误率和开发错误率之间的差值,告诉你方差上的问题有多大,你的算法是否能够从训练集泛化推广到开发集,今天讲的,和之前课程中见到的 重大区别是,以前你们比较的是训练错误率和0%,直接用这个值估计偏差,相比之下 在这个视频中 我们有一个更微妙的分析,其中并没有假设你应该得到0%错误率,因为有时贝叶斯错误率是非零的,有时基本不可能,做到比某个错误率阈值更低。

And so in the earlier course, we were measuring training error,and seeing how much bigger training error was than zero.And just using that to try to understand how big our bias is.And that turns out to work just fine for problems where Bayes error is nearly 0%,such as recognizing cats.Humans are near perfect for that, so Bayes error is also near perfect for that.So that actually works okay when Bayes error is nearly zero.But for problems where the data is noisy,like speech recognition on very noisy audio where it’s just impossible sometimes to hear what was said and to get the correct transcription.For problems like that, having a better estimate for Bayes errorcan help you better estimate avoidable bias and variance.And therefore make better decisions on whether to focus on bias reduction tactics,or on variance reduction tactics.

所以在之前的课程中 我们测量的是训练错误率,然后观察的是训练错误率比 0% 高多少,就用这个差值来估计偏差有多大,而事实证明 对于贝叶斯错误率几乎是 0% 的问题这样就行了,例如识别猫,人类表现接近完美 所以贝叶斯错误率也接近完美,所以当贝叶斯错误率几乎为零时 可以那么做,但数据噪点很多时,比如背景声音很嘈杂的语言识别,有时几乎不可能听清楚说的是什么,并正确记录下来,对于这样的问题 更好的估计贝叶斯错误率很有必要,可以帮助你更好地估计可避免偏差和方差,这样你就能更好的做出决策 选择减少偏差的策略,还是减少方差的策略。

So to recap, having an estimate of human-level performance gives you an estimate of Bayes error.And this allows you to more quickly make decisions as to whether you should focus on trying to reduce a bias or trying to reduce the variance of your algorithm.And these techniques will tend to work well until you surpass human-level performance,where upon you might no longer have a good estimate of Bayes error that still helps you make this decision really clearly.Now, one of the exciting developments in deep learning has been that for more and more tasks we’re actually able to surpass human-level performance.In the next video,let’s talk more about the process of surpassing human-level performance.

回顾一下 对人类水平有大概的估计,可以让你做出对贝叶斯错误率的估计,这样可以让你更快地作出决定,是否应该专注于减少算法的偏差,或者减少算法的方差,这个决策技巧通常很有效 直到你的系统性能开始超越人类,那么你对贝叶斯错误率的估计就不再准确了,但这些技巧还是可以帮你做出明确的决定,现在 深度学习的令人兴奋的发展之一就是,对于越来越多的任务 我们的系统实际上可以超越人类了,在下一个视频中,让我们继续谈谈超越人类水平的过程。


重点总结:

这里写图片描述

这里写图片描述

这里写图片描述

理解人类表现

如医学图像分类问题上,假设有下面几种分类的水平:

  1. 普通人:3% error
  2. 普通医生:1% error
  3. 专家:0.7% error
  4. 专家团队:0.5% error

在减小误诊率的背景下,人类水平误差在这种情形下应定义为:0.5% error;

如果在为了部署系统或者做研究分析的背景下,也许超过一名普通医生即可,即人类水平误差在这种情形下应定义为:1% error;

总结:

对人类水平误差有一个大概的估计,可以让我们去估计贝叶斯误差,这样可以让我们更快的做出决定:减少偏差还是减少方差。

而这个决策技巧通常都很有效果,直到系统的性能开始超越人类,那么我们对贝叶斯误差的估计就不再准确了,再从减少偏差和减少方差方面提升系统性能就会比较困难了。

参考文献:

[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(3-1)– 机器学习策略(1)


PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。