使用 jsinspect 检测前端代码库中的重复/近似代码

时间 2019-11-09

标签使用 jsinspect 检测前端代码重复近似繁體版

原文原文链接

使用 jsinspect 检测前端代码库中的重复/近似代码从属于笔者的 Web 前端入门与工程实践，更多前端相关学习资料推荐阅读前端每周清单第6期：Angular 4.0学习资源，Egg.js 1.0发布，六问CTO程序员如何成长、泛前端知识图谱（Web/iOS/Android/RN）。前端

在开发的过程当中咱们每每会存在大量的复制粘贴代码的行为，这一点在项目的开发初期尤为显著；而在项目逐步稳定，功能需求逐步完善以后咱们就须要考虑对代码库的优化与重构，尽可能编写清晰可维护的代码。好的代码每每是在合理范围内尽量地避免重复代码，遵循单一职责与 Single Source of Truth 等原则，本部分咱们尝试使用 jsinspect 对于代码库进行自动检索，根据其反馈的重复或者近似的代码片进行合理的优化。固然，咱们并非单纯地追求公共代码地彻底剥离化，过分的抽象反而会下降代码的可读性与可理解性。jsinspect 利用 babylon 对于 JavaScript 或者 JSX 代码构建 AST 语法树，根据不一样的 AST 节点类型，譬如 BlockStatement、VariableDeclaration、ObjectExpression 等标记类似结构的代码块。咱们可使用 npm 全局安装 jsinspect 命令：node

Usage: jsinspect [options] <paths ...>


Detect copy-pasted and structurally similar JavaScript code
Example use: jsinspect -I -L -t 20 --ignore "test" ./path/to/src


Options:

  -h, --help                         output usage information
  -V, --version                      output the version number
  -t, --threshold <number>           number of nodes (default: 30)
  -m, --min-instances <number>       min instances for a match (default: 2)
  -c, --config                       path to config file (default: .jsinspectrc)
  -r, --reporter [default|json|pmd]  specify the reporter to use
  -I, --no-identifiers               do not match identifiers
  -L, --no-literals                  do not match literals
  -C, --no-color                     disable colors
  --ignore <pattern>                 ignore paths matching a regex
  --truncate <number>                length to truncate lines (default: 100, off: 0)

咱们也能够选择在项目目录下添加 .jsinspect 配置文件指明 jsinspect 运行配置：git

{
  "threshold":     30,
  "identifiers":   true,
  "literals":      true,
  "ignore":        "test|spec|mock",
  "reporter":      "json",
  "truncate":      100,
}

在配置完毕以后，咱们可使用 jsinspect -t 50 --ignore "test" ./path/to/src 来对于代码库进行分析，以笔者找到的某个代码库为例，其检测出了上百个重复的代码片，其中典型的表明以下所示。能够看到在某个组件中重复编写了屡次密码输入的元素，咱们能够选择将其封装为函数式组件，将 label、hintText 等通用属性包裹在内，从而减小代码的重复率。程序员

Match - 2 instances

./src/view/main/component/tabs/account/operation/login/forget_password.js:96,110
return <div className="my_register__register">
    <div className="item">
        <Paper zDepth={2}>
            <EnhancedTextFieldWithLabel
                label="密码"
                hintText="请输入密码,6-20位字母,数字"
                onChange={(event, value)=> {
                    this.setState({
                        userPwd: value
                    })
                }}
            />
        </Paper>
    </div>
    <div className="item">

./src/view/main/component/tabs/my/login/forget_password.js:111,125
return <div className="my_register__register">
    <div className="item">
        <Paper zDepth={2}>
            <EnhancedTextFieldWithLabel
                label="密码"
                hintText="请输入密码,6-20位字母,数字"
                onChange={(event, value)=> {
                    this.setState({
                        userPwd: value
                    })
                }}
            />
        </Paper>
    </div>
    <div className="item">

笔者也对于 React 源码进行了简要分析，在 246 个文件中共发现 16 个近似代码片，而且其中的大部分重复源于目前基于 Stack 的调和算法与基于 Fiber 重构的调和算法之间的过渡时期带来的重复，譬如：github

Match - 2 instances

./src/renderers/dom/fiber/wrappers/ReactDOMFiberTextarea.js:134,153
  var value = props.value;
  if (value != null) {
    // Cast `value` to a string to ensure the value is set correctly. While
    // browsers typically do this as necessary, jsdom doesn't.
    var newValue = '' + value;

    // To avoid side effects (such as losing text selection), only set value if changed
    if (newValue !== node.value) {
      node.value = newValue;
    }
    if (props.defaultValue == null) {
      node.defaultValue = newValue;
    }
  }
  if (props.defaultValue != null) {
    node.defaultValue = props.defaultValue;
  }
},

postMountWrapper: function(element: Element, props: Object) {

./src/renderers/dom/stack/client/wrappers/ReactDOMTextarea.js:129,148
  var value = props.value;
  if (value != null) {
    // Cast `value` to a string to ensure the value is set correctly. While
    // browsers typically do this as necessary, jsdom doesn't.
    var newValue = '' + value;

    // To avoid side effects (such as losing text selection), only set value if changed
    if (newValue !== node.value) {
      node.value = newValue;
    }
    if (props.defaultValue == null) {
      node.defaultValue = newValue;
    }
  }
  if (props.defaultValue != null) {
    node.defaultValue = props.defaultValue;
  }
},

postMountWrapper: function(inst) {

笔者认为在新特性的开发过程当中咱们不必定须要时刻地考虑代码重构，而是应该相对独立地开发新功能。最后咱们再简单地讨论下 jsinspect 的工做原理，这样咱们能够在项目须要时自定义相似的工具以进行特殊代码的匹配或者提取。jsinspect 的核心工做流能够反映在 inspector.js 文件中：算法

... 
this._filePaths.forEach((filePath) => {
  var src = fs.readFileSync(filePath, {encoding: 'utf8'});
  this._fileContents[filePath] = src.split('\n');
  var syntaxTree = parse(src, filePath);
  this._traversals[filePath] = nodeUtils.getDFSTraversal(syntaxTree);
  this._walk(syntaxTree, (nodes) => this._insert(nodes));
});

this._analyze();
...

上述流程仍是较为清晰的，jsinspect 会遍历全部的有效源码文件，提取其源码内容而后经过 babylon 转化为 AST 语法树，某个文件的语法树格式以下：express

Node {
  type: 'Program',
  start: 0,
  end: 31,
  loc:
   SourceLocation {
     start: Position { line: 1, column: 0 },
     end: Position { line: 2, column: 15 },
     filename: './__test__/a.js' },
  sourceType: 'script',
  body:
   [ Node {
       type: 'ExpressionStatement',
       start: 0,
       end: 15,
       loc: [Object],
       expression: [Object] },
     Node {
       type: 'ExpressionStatement',
       start: 16,
       end: 31,
       loc: [Object],
       expression: [Object] } ],
  directives: [] }
{ './__test__/a.js': [ 'console.log(a);', 'console.log(b);' ] }

其后咱们经过深度优先遍历算法在 AST 语法树上构建全部节点的数组，而后遍历整个数组构建待比较对象。这里咱们在运行时输入的 -t 参数就是用来指定分割的原子比较对象的维度，当咱们将该参数指定为 2 时，通过遍历构建阶段造成的内部映射数组 _map 结构以下：npm

{ 'uj3VAExwF5Avx0SGBDFu8beU+Lk=': [ [ [Object], [Object] ], [ [Object], [Object] ] ],
  'eMqg1hUXEFYNbKkbsd2QWECLiYU=': [ [ [Object], [Object] ], [ [Object], [Object] ] ],
  'gvSCaZfmhte6tfnpfmnTeH+eylw=': [ [ [Object], [Object] ], [ [Object], [Object] ] ],
  'eHqT9EuPomhWLlo9nwU0DWOkcXk=': [ [ [Object], [Object] ], [ [Object], [Object] ] ] }

若是有大规模代码数据的话咱们可能造成不少有重叠的实例，这里使用了 _omitOverlappingInstances 函数来进行去重；譬如若是某个实例包含节点 abcd，另外一个实例包含节点组 bcde，那么会选择将后者从数组中移除。另外一个优化加速的方法就是在每次比较结束以后移除已经匹配到的代码片：json

_prune(nodeArrays) {
  for (let i = 0; i < nodeArrays.length; i++) {
    let nodes = nodeArrays[i];
    for (let j = 0; j < nodes.length; j++) {
      this._removeNode(nodes[j]);
    }
  }
}