There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.ios
– Albert Einstein算法
生活只有两种方式。一个好像什么都不是奇迹。另外一个就好像一切都是奇迹。 - 艾尔伯特爱因斯坦session
Advanced Analytics Professional: An Unbiased Observer – by Roopamapp
I think the best way to appreciate and enjoy the trivial is to travel. When I say trivial, it includes doorknobs, posters, letterboxes, graffiti and everything we never bother to turn our heads for in our own city. I experienced the same last week while traveling with my wife across Florence and Tuscany. I think one’s level of awareness and curiosity goes up many-fold while traveling. In Florence, we stayed at a lovely bed-and-breakfast named Fiorenza. The breakfast was good and the people even better. There we met this amicable family from the UK with a year old baby named Owen and his 7-year-old sister Kyra. Owen and Kyra were playing hide and seek while having their breakfast. Kyra hid behind the same chair repeatedly and jumped out to reveal herself to her younger brother. Owen was pleasantly surprised every time during this process. All humans are born curious. However, they lose it as they grow older and get familiar with things. The phenomenon could be the reason why we never turn our heads for the trivial in our own city.less
我认为欣赏和享受杂事的最佳方式是旅行。当我说琐碎的时候,它包括门把手,海报,信箱,涂鸦以及咱们从未在咱们本身的城市中转过头来作的一切。上周我与妻子一块儿在佛罗伦萨和托斯卡纳旅行时经历了一样的经历。我认为一我的的意识水平和好奇心在旅行时会增长不少倍。在佛罗伦萨,咱们住在一个可爱的住宿加早餐,名为Fiorenza。早餐很好,人们甚至更好。在那里,咱们遇到了这个来自英国的友好家庭,一个名叫Owen的婴儿和他7岁的妹妹Kyra。欧文和凯拉在吃早餐时玩捉迷藏。凯拉反复躲在同一把椅子后面,跳出来向她的弟弟透露本身。欧文在这个过程当中每次都感到惊喜。全部人都天生好奇。然而,随着年龄的增加和熟悉事物,他们会失去它。这种现象多是咱们永远不会为本身城市中的杂事而烦恼的缘由。ide
Being curious and aware requires constant energy and effort. Perhaps, humans have the natural tendency to slip into a low energy state. Nonetheless, this is particularly dangerous for analysts since their job requires finding meaning in something that seems mundane to others. In my opinion, the biggest challenge for analytics is not the sophistication of statistical algorithms and enhancement of computing power, but for its practitioners to stay curious and constantly ask questions. Zen Buddhists try to achieve cosmic awareness by living in the moment. If that is too difficult, I would recommend that treat your job like a wonderful travel destination and be a good tourist – curious and aware.oop
Ok, so that was a bit of a detour from our original discussion on scorecards. However, there are a couple of reasons for telling you the above: primarily, to tell you why I was late in posting this part of the series. Secondly, I would like us to have a discussion on the importance and challenges of being curious at work and life in general. I already have a few examples in mind i.e. Louis Pasteur and Edward Lorenz but that is for later.post
Now, let’s continue with the topic for this part i.e. model evaluation.性能
好奇心与数据科学事业
充满好奇和意识须要不断的精力和努力。也许,人类有天然倾向于陷入低能量状态。尽管如此,这对分析师来讲尤为危险,由于他们的工做须要在对他人而言看似平凡的事情中找到意义。在我看来,分析的最大挑战不是统计算法的复杂性和计算能力的提升,而是让其从业者保持好奇并不断提出问题。禅宗佛教徒试图经过生活在当下来实现宇宙意识。若是这太难了,我建议把你的工做看成一个很棒的旅游目的地,作个好游客 - 好奇又有意识。
好的,因此这与咱们对记分卡的原始讨论有点迂回。可是,有几个缘由告诉你上面的内容:主要是告诉你为何我在发布这个系列的这一部分时迟到了。其次,我但愿咱们讨论通常对工做和生活充满好奇的重要性和挑战。我已经有一些例子,即路易斯巴斯德和爱德华洛伦兹,但这是为了之后。
如今,让咱们继续讨论这个部分的主题,即模型评估。
Model Evaluation & Validation: the test of the pudding is in the eating – by Roopam
When I was in high school, I joined a cricket academy during the summer vacations. Cricket is a game quite similar to baseball. I shall use baseball terminology in parenthesises for everyone to understand. The design of the training camp was to train for about a month followed by a full game with kids at same skill-level from another club. There was this tall and lean kid with us in the camp; he was the star bowler (pitcher) throughout during the training sessions. He used to bowl (pitch) some of the best Yorkers (curve balls). We were quite sure he would outperform everyone in the game. We ask him to open the bowling, his first bowl went for a six (home run) followed by several more. Maybe it was a mix match pressure, expectations, and the crowd but his performance was an absolute disaster. Later the coach told us what happened was not unusual and he had seen this several times before. At higher levels, the game is played not on the ground but the space between the ears. Clearly, he was referring to players’ presence of mind and temperament.
当我在高中时,我在暑假期间加入了板球学院。 Cricket是一款与棒球很是类似的游戏。我将在括号中使用棒球术语,让每一个人都能理解。训练营的设计是训练大约一个月,而后与来自另外一个俱乐部的相同技能水平的孩子进行完整的比赛。在营地里有一个高大瘦弱的孩子和咱们在一块儿;在训练期间,他一直是明星投手(投手)。他过去经常把一些最好的Yorkers(曲线球)弄成一团糟。咱们很是确定他会在游戏中赛过每一个人。咱们要求他打开保龄球,他的第一个碗去了六个(本垒打),而后是几个。也许这是混合比赛压力,指望和人群,但他的表现是绝对的灾难。后来教练告诉咱们发生的事情并不罕见,他之前曾屡次见过这件事。在更高的级别,游戏不是在地面上播放,而是在耳朵之间的空间播放。显然,他指的是球员的思想和睦质。
As the famous saying goes, the test of the pudding is in the eating. One could be a star on the training fields but a complete flop in the match situation. The same is true for an analytical model as well. A model, after going through a round of training (Part 5 of the series) goes through a several rounds of testing.
1. Out of sample test: remember article 2, where we have divided our sample into the training and the test sample. The first level of testing happens on the holdout or test sample. The test sample needs to perform as well as the training sample. Let us come back to this in the next section when I will discuss the measures for performance and ROC curve.
2. Out of time sample test: since the model was built on a sample of the portfolio with reasonable vintage (refer to Part 2), the analyst would like to test the performance of a more recent portfolio. The number of bad borrowers (90+ DPD) in this out of time sample will be certainly less but the overall trend of good/bad ratio against scores will still be a good indicator for model performance. Additionally, the analyst could relax the condition for bad loans and consider 30+ DPD as bad. Again, the overall trend should match the scorecard estimations.
3. On field test: this is where the test of the pudding is; the analyst needs to be completely aware of any credit policy changes that the bank has gone through since the scorecard is developed and more importantly, the impact the changes will have on the scorecard. Always remember not every policy change will influence the scorecard – a good business understanding and a bit of common sense really help here. A regular monitoring and accordingly calibrating the scorecard is a good way to keep it updated.
正如俗名所说,布丁的考验就在于吃。一我的多是训练场上的明星,但在比赛状况下彻底失败了。对于分析模型也是如此。通过一轮训练(系列的第5部分)后,模型通过了几轮测试。
1.train VS test样品外测试:记住第2条,咱们将样品分红培训和测试样品。第一级测试发生在保持或测试样本上。测试样本须要与训练样本同样好。让咱们在下一节回到这一点,我将讨论性能和ROC曲线的措施。
2.OOT超时样本测试:因为该模型是基于合理年份的投资组合样本(参见第2部分),所以分析师但愿测试最近投资组合的表现。在这段时间样本中,不良借款人(90+ DPD)的数量确定会减小,可是对比分的好/坏比率的总体趋势仍将是模型表现的良好指标。此外,分析师能够放松不良贷款的条件,并认为30+ DPD是坏的。一样,总体趋势应该与记分卡估计相匹配。
3.政策变化对模型影响大
场景测试:这是布丁测试的地方;分析师须要彻底了解银行自开发记分卡以来所经历的任何信贷政策变化,更重要的是,变动将对记分卡产生的影响。永远记住不是每一个政策变化都会影响记分卡 - 良好的商业理解和一些常识在这里真的颇有帮助。按期监控并相应地校准记分卡是保持更新的好方法。
There are several ways to test the performance of the scorecard such as confusion matrix, KS statistics, Gini and area under ROC curve (AUROC) etc. The KS statistics is widely used metric in scorecards development. However, I personally prefer the AUROC to the others. I must add the Gini is a variant of the AUROC. The reason for my liking of the AUROC could be my formal training in Physics and engineering. I think it is a more holistic measure and lets the analyst visually analyze the model performance. I prefer graph and visual statistics any day to raw numbers.
有几种方法能够测试记分卡的性能,例如混淆矩阵,KS统计,基尼系数和ROC曲线下面积(AUROC)等.KS统计量是记分卡开发中普遍使用的度量标准。 可是,我我的更喜欢AUROC和其余人。 我必须添加Gini是AUROC的变种。 我喜欢AUROC的缘由多是我在物理和工程方面的正式培训。 我认为这是一个更全面的衡量标准,让分析师能够直观地分析模型的表现。 我更喜欢图形和视觉统计数据,以及原始数字。
ROC Curve: for Credit Scorecard Model Validation and Evaluation – by Roopam
The adjacent graph shows a ROC. The two axes on the curve are true and false positive rates. As expected, the plot informs about the level of prediction for the model. A perfect model will perfectly segregate good and bad cases. Hence, you will get 100% true positives in the beginning (i.e. absolute lift) as shown with the green curve in the graph. However, like anything in life perfection does not exist. As they say – If it is too good to be true it probably is. On the other extreme is a worthless model, curve marked in red. Anything close to or below the red curve is as good as tossing a coin, then why to bother with the effort to build a model. Finally, a typical scorecard ROC will look like the blue curve. The AUROC for a usual credit-scoring model is within 70 to 85, higher the better. However, for some fraud and insurance models, a slightly above 60 is an acceptable ROC. Again, analysts should be sure about the business benefits from the scorecard before finalizing the ROC. A simple cost-benefit analysis helps significantly before finalizing the model and reporting it to the top management.
相邻的图表显示了ROC。曲线上的两个轴是真实和误报率。正如预期的那样,该图表通知了该模型的预测水平。一个完美的模型将完美地隔离好的和坏的案件。所以,您将在开始时得到100%真实的正数(即绝对提高),如图中的绿色曲线所示。可是,生活中的任何事物都不存在完美。正如他们所说 - 若是真是太好了,那可能就是这样。另外一个极端是一个毫无价值的模型,曲线标记为红色。任何靠近或低于红色曲线的东西都和投掷硬币同样好,那么为何要费心去打造一个模型。最后,典型的记分卡ROC看起来像蓝色曲线。一般的信用评分模型的AUROC在70到85之间,越高越好。可是,对于某些欺诈和保险模式,略高于60的是可接受的ROC。一样,分析师应该在最终肯定ROC以前确保记分卡的业务收益。在最终肯定模型并将其报告给最高管理层以前,简单的成本效益分析能够显着提供帮助。
I hope after reading this, you will pick up your camera and visit that unexplored nook at the corner of the street – and be ready for some wonderful surprises!
References1. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring – Naeem Siddiqi 2. Credit Scoring for Risk Managers: The Handbook for Lenders – Elizabeth Mays and Niall Lynas