Verification Mind Games---how to think like a verifier像验证工程师同样思考

1. 有效的验证须要验证工程师使用不一样于设计者的思惟方式思考问题。具体来讲,验证更加关心在严格遵循协议的基础上发现设计里面的bug,搜索corner cases,对设计的不一致要保持零容忍的态度。

mindset:一套人们应该持有的肯定的态度,有时候又被描述为内心惯性,群体思惟,范式,在分析和决策过程当中很难抵消mindset的影响。

举一个简单的例子,当你看到任何verification engineer的职位,你会发现这是一个关于语言,方法学,工具以及某种领域的知识集合。
 
不少有经验的工程师能够学习阅读一个规范,创建验证规范,以及编写充分的代码去实现,可是却在某个关键的点达不到验证的目的。

本文将试图 使用可以出如今 验证环境中的 一系列关键的选择 揭示验证心态:
   1.  Is it the verification  environment’s duty  to accurately  replicate the real  world?
        验证环境是否有责任复制真实的环境?
   2. Is it acceptable for the testbench and/or testcases to make use of design signals?
        是否容许TB或者testcase使用设计的信号
   3.  Is it worthwhile to target corner cases that designers consider invalid?
         考虑设计者认为无效的corner cases是否值得

合格的验证mindset跟设计mindset有很大的不一样,须要多年的经验,指点,以及在困难中摸索的经历才能培养出来。
这贯穿在:每个决定,每一行代码,每一次会议,每个项目。对设计为中心的验证方法说不!!!

当前,验证思惟,依旧是一个被低估和弱势的产业。在本文中,咱们将分析为何这个 验证有效性的 话题如此关键。咱们也将讨论影响一个TB debuggability的决策。在这个文章里面,咱们将介绍一个验证工程师应该作的,而不是他们应该如何作。

1.  INTEROPERATING VERSUS STRESSING(互操做对抗强调)
 验证里面最重要的挑战就是划出设计强调的重点,并与设计进行互操做。只有怀有正确的mindset,才能够正确的划分验证重点,
   这些里面比较常见的有:
      1. clock and data recovery (CDR) 时钟和数据恢复
      2. 错误状况下的握手
      3. 状态指标,例如fifo的空满

A. 时钟数据恢复
    1. 一些协议规定,数据在发送和接收的时候没有附带的时钟信号。在这种状况下,设计须要实现这个协议须要实现一个时钟数据恢复功能,以便在传输过来的数据里面提取时钟。针对这个设计,开发VC验证组件,也有一个类似的需求,没有时钟信号发送和接收,只有数据。


如图所示,当实现VC的监控,验证工程师必须决定以哪一种方法收到来自DUT的数据。 是否应该在monitor实现一个CDR算法,而这个算法已经在设计中实现过。 固然不是这样,更好的解决办法是 实现一个路差分(锁相环)算法基于相同的参考时钟DUT使用做为参考,并利用锁相环的输出样本输入数据。

This approach accomplishes the following:
  1.  It verifies that the DUT’s data stream is in sync with the  reference clock
  2.  It avoids any possibility of the testbench masking a  problem because the CDR algorithm is too tolerant
  3.  It has better simulation performance than doing costly  checks on data rates
  4.  It is faster and simpler to implement than CDR
咱们来看看验证视角和设计视角有什么不一样:
当构建一个RTL CDR组件,设计师努力用最健壮的方式构建算法,可以与普遍的外部设备交互。而验证工程师反而试图尽可能建立不够强健的算法。 so it stresses the design and fails on the slightest  deviation from the protocol specification.
所以,验证的目标不是为了复制现实,而是尽量全面的验证设计,虽然可能这有可能不是很现实。

 
B.  Handshaking Error Handling 
   大部分的协议要求使用一组反馈信息 ACK/NACK报文来指示在最近接受的传输中是否有错误产生。


1. 若是按照设计的思惟,VC的实现应该遵循协议规范,也就是说自动发送ACK,当接受到一笔没有错误的传输,自动发送NACK当接收到一笔错误的传输。
2. 上面的彷佛忽视一种状况,正常状况下DUT永远不会产生错误的传输。那么TB也就会永远不会返回NACK。
3. 而验证组件须要负责产生这么一个错误, the testcase writer must be  able to manually control the VC to send a NACK in response.
4. 还有另一种状况,就是握手异常的状况,就是握手信号没有返回给design。这要求VC须要support 错误插入的功能。
5. 验证VC若是实现完整的协议会致使验证失败以及浪费精力。

C.  Status Indicators and Clocks (这是你们都知道的典型,这里省略)

OUTSIDE-THE-BOX VERIFICATION PLANNING 
A. Corner case identification:
   1. Function Input Parameters
       a. whether or not something is valid or  invalid is in fact irrelevant; what is important is, can such a  scenario ever happen, and if the answer it “yes”, then it must be  simulated to ensure that the design recovers from it.  
       b. It was the responsibility of the  verification engineer to take a higher-level view of things in  order to build the best possible verification environment. 
   
   2. Register Accesses:
      考虑这种状况: a design is specified to have a  low-power mode that can be activated by writing a ‘1’ to a  given register bit.  The bit’s default value is ‘0’, making the  device be in normal mode by default.  In th
      1. 首先咱们应该很容易想到测试以下的状况: 
            x  Write ‘1’ to the low-power bit
           x  Check that the device enters low-power mode
           x  Write ‘0’ to the low-power bit
           x  Check that the device exits low-power mode
      2. 可是有另一种状况没有考虑到:
           what  happens when a ‘0’ is written to the bit when the bit is already  ‘0’ (or a ‘1’ when it is already ‘1’)? 
         这里面会隐含一个关键性的错误,说不定在0的状况下写0回不正确的进入低功耗模式。

验证计划包括全部可能发生的事情,不论是否DUT旨在处理它们,和不管设计师可能会说什么
As you  can imagine, doing error injection in creative ways greatly  expands the search space for finding bugs, and so experience  and a degree of gut-feeling is required to target those areas  most likely to be concealing real bugs.   Where is the line  between inter- operating with  the design and  stressing it?

PRIORITIZING DEBUGGABILITY 
1. 一个良好的TB强调可调试性!!
    
    A.  Protocols with Bi-Directional Ports 
Some devices try to save on pins and board trace routing by  employing bi-directional ports for data and/or clock signals.  这种协议的本质,至少有两个设备负责驱动数据和时钟信号,不然也不会使用双向端口。

上面是一种接口的实现方式,这种方式的可调试很低,由于全部的DUT和VC链接在同一个双向端口,可是很难肯定究竟是哪一个组件在驱动总线。

Using a verification mindset, we make use of both  unidirectional and bi-directional signals to achieve both ease of  debug and adherence to the protocol.  

(In Figure 10, “(highz1, strong0)” means “when signal is  assigned with a ‘1’, it takes on ‘Z’; when it is assigned to with  a ‘0’, it takes on ‘0’ ”). 

在DUT里面,使用以下方式驱动:


ROUNDING OUT THE MINDSET 

To what extent must  the verification  component follow  the design  protocol?
验证组件在多大程度上必须遵循设计协议?

The DUT is sending data without an  accompanying clock - should my VC do  Clock-Data-Recovery?...No
DUT发送数据没有附带时钟- 个人VC是否须要作Clock-Data-Recovery

The protocol has bi-directional signals  with the potential for multiple masters.   Should I split each signal into two at the  VC interface level?... Yes.

 


​V.  ROUNDING OUT THE MINDSET 

A. No coverage without Checking:没有检查就没有覆盖率
  之前面对一个low-power使能位写0当这个bit为0的时候为例,突出了另外一个验证心态,就是 never do coverage on  anything in the absence of doing checks.  This rules out doing  register value coverage, because it is misleading at best and a  waste of compute resources in a large system-on-a-chip (SOC). 

B Approach to Debugging 调试的方法:
    1. 使用waveform的方式debug并不适合VC,由于不少操做并不消耗任什么时候间,在发送激励的时候使用动态数据结构。应该有一种心态,最好的debug 工具应该是logfile自己。对于logfile:
    1. 要有适当的和一致的消息模式
    2. 固然这并非说,能够不用波形去调试,可是更应该依靠日志。
    3. 若是不可以使用logfile去debug,那只能说明消息机制须要提升。

C.  Zoom-In, Zoom-Out Thinking 
When zoomed-in the engineer does tasks such as:
x  Understand design specifications
x  Write verification plans based on the specification
x  Write code to implement the verification plan
x  Write testcase code
x  Debug failing testcases 

Verification engineers must also do a series of tasks while  zoomed-out such as:
x  Decide which design features to focus on to maximize  bug discovery  Devise creative ways to tease bugs out
x  Allocate time so as to get  the most important checking  and coverage for the effort  

The main reason for this is difference is that verifiers need  to deal with a larger scope than designers do.  
   1. 不一样于设计必须在tapeout以前完成他们的工做,验证能够再tapeout以后继续进行,或者放弃
   2. 验证人员想作到这一点,必需要接触普遍的信息
   3.  system architecture, design hot-spots, project schedule, and  client deliverables.

D. What Are We Trying to Accomplish Here? 

E.  Coverage, Not Testcases 

F.  Liaison between Design Architect and Design Engineer 

G Quitting on First Error

最后简单的总结一下:

1. The protocol says that when an error  condition is detected, the design must  send a NACK packet.  Should my VC  automatically send NACK too?... No.
     协议说,当检测到一个错误条件,设计必须发送NACK包。应该个人VC也自动发送NACK吗?…不。

2. The design indicates FIFO fullness with  signals “full” and “empty”.  Can my VC  or testcase make use of them to  prevent overflow/underflow?... Yes, but  only if checks are made on them

3. Is it sufficient to limit error injection to   the scenarios for which the DUT has   detection capabilities?... No.

4. A design has a register bit that defaults  to ‘0’, and causes the DUT to enter  low-power mode when written with ‘1’.   Is writing ‘0’ when it is already at ‘0’  important to test?... Yes.

5. Can I snoop the design’s internal clock  to synchronize my VC to?... No.

6. Should I ask myself “what is it I’m trying  to accomplish here?” when embarking  on a new verification task... Yes
    我应该问本身“是什么我想完成吗?“当开始一个新的验证任务……是的

7. Is it my responsibility to ensure that the  design architect and design engineer  are on the same page?... Yes.
    这是个人责任,以确保设计建筑师和设计工程师在同一页面吗?…是的。

8.  Should I regularly step back from low- level implementation and take a high- level view of the verification effort as a  whole?... Yes.

9. A design draws circles of radius given  by an input parameter.  Is testing a  radius of zero important?... Yes

10. Should I implement coverage on  individual register values?...No.

11. Should I be relying mainly on waveforms  to debug my VC?...No.

12. Is coverage closure more important than  testcase passing rate?... Yes.


13. Is it necessary to allow a simulation to continue running after it has encountered an error?... No.




相关文章
相关标签/搜索