浏览器中的天然语言处理

译/阿里淘系 F(x) Team - 画北javascript

原文地址：Natural Language Processing in the Browserhtml

为网站构建一个聊天机器人，无需依赖 Dialogflow 或 Watson 等第三方服务，也无需服务器已经成为可能。接下来我将展现如何构建一个彻底在浏览器中运行的聊天机器人。前端

本文须要对 JavaScript 有必定的了解，并了解天然语言处理的工做原理，可是不须要高级机器学习的知识或经验。java

在使用 JavaScript 的浏览器中进行机器学习听起来很疯狂，但接下来你将看到一个聊天机器人的诞生过程。git

咱们将基于 NLP.js（版本4）开发。NLP 是用 JavaScript 编写的天然语言处理的开源库。该库将容许你直接使用语料库在浏览器中训练 NLP，并将 Hook 添加到任何以编程方式更改答案的意图github

最终项目能够在 GitHub仓库上找到。能够下载它，打开 index.html，而后与最终的聊天机器人对话。web

现在，每一个真正的开发人员都应具有一些人工智能方面的经验，这比使用你本身开发的东西与你的计算机进行交谈听起来更像科幻小说。npm

安装套件

在任意文件夹中建立一个新的npm项目并安装NLP软件包：编程

npm i -D @nlpjs/core @nlpjs/lang-en-min @nlpjs/nlp @nlpjs/request-rn@nlpjs/request-rn
复制代码

咱们还须要 browserify 和 terser，以便可以构建 NLP 在浏览器使用：json

npm i -D browserify terser
复制代码

全新安装的软件包可为您带来新项目的味道，好好享受它。

创建NLP

第一步是使用 browserify 和 terser 构建 NLP。为此，咱们只须要在 buildable.js 中建立一个基本设置：

const core = require('@nlpjs/core');
const nlp = require('@nlpjs/nlp');
const langenmin = require('@nlpjs/lang-en-min');
const requestrn = require('@nlpjs/request-rn');

window.nlpjs = { ...core, ...nlp, ...langenmin, ...requestrn };
复制代码

咱们仅使用 NLP 的核心和小型英语包。要构建全部内容，只需将 build 命令添加到 package.json：

{
  "name": "nlpjs-web",
  "version": "1.0.0",
  "scripts": {
    "build": "browserify ./buildable.js | terser --compress --mangle > ./dist/bundle.js",
  },
  "devDependencies": {
    "@nlpjs/core": "^4.14.0",
    "@nlpjs/lang-en-min": "^4.14.0",
    "@nlpjs/nlp": "^4.15.0",
    "@nlpjs/request-rn": "^4.14.3",
    "browserify": "^17.0.0",
    "terser": "^5.3.8"
  }
}
复制代码

如今运行构建：

npm run build
复制代码

最后构建出的 ./dist/bundle.js 只有大约137 KB。还要指出的是，NLP拥有使人印象深入的受支持语言列表。可是，只有英语具备针对浏览器的优化版本。

在浏览器中训练NLP

如今已经建立了包，能够在浏览器中训练NLP。先建立index.html：

<html>
<head>
    <title>NLP in a browser</title>
    <script src='./dist/bundle.js'></script>
    <script> const {containerBootstrap, Nlp, LangEn, fs} = window.nlpjs; const setupNLP = async corpus => { const container = containerBootstrap(); container.register('fs', fs); container.use(Nlp); container.use(LangEn); const nlp = container.get('nlp'); nlp.settings.autoSave = false; await nlp.addCorpus(corpus); nlp.train(); return nlp; }; (async () => { const nlp = await setupNLP('https://raw.githubusercontent.com/jesus-seijas-sp/nlpjs-examples/master/01.quickstart/02.filecorpus/corpus-en.json'); })(); </script>
</head>
<body>
    <h1>NLP in a browser</h1>
    <div id="chat"></div>
    <form id="chatbotForm">
        <input type="text" id="chatInput" />
        <input type="submit" id="chatSubmit" value="send" />
    </form>
</body>
</html>
复制代码

函数 setupNLP 对咱们来讲，将负责库的设置以及训练。语料库是一个 JSON 文件，它以如下格式定义咱们的聊天机器人的对话方式：

"intent"(意图)是会话节点的惟一标识符，其名称应表示聊天机器人作出响应的用户的意图。
"utterances"(话语)是用户能够说出触发意图的一系列训练示例。
"answers"(答案)是聊天机器人将随机选择的一系列响应。

{
  "name": "Corpus",
  "locale": "en-US",
  "data": [
    {
      "intent": "agent.acquaintance",
      "utterances": [
        "say about you",
        "why are you here",
        "what is your personality",
        "describe yourself",
        "tell me about yourself",
        "tell me about you",
        "what are you",
        "who are you",
        "I want to know more about you",
        "talk about yourself"
      ],
      "answers": [
        "I'm a virtual agent",
        "Think of me as a virtual agent",
        "Well, I'm not a person, I'm a virtual agent",
        "I'm a virtual being, not a real person",
        "I'm a conversational app"
      ]
    },
    {
      "intent": "agent.age",
      "utterances": [
        "your age",
        "how old is your platform",
        "how old are you",
        "what's your age",
        "I'd like to know your age",
        "tell me your age"
      ],
      "answers": [
        "I'm very young",
        "I was created recently",
        "Age is just a number. You're only as old as you feel"
      ]
    }
  ]
}
复制代码

为了训练咱们的聊天机器人，咱们从库的例子中借用了更大的语料库

可是对于用例，请随时建立本身的语料库。只要记住，库但愿从某个 URL 读取语料库。index.html 在浏览器中打开时，您应该会看到一个简单的聊天表格，该表格目前尚未任何做用。

可是，若是打开浏览器控制台，您已经能够看到成功的训练输出：

训练很是快速，并使训练后的模型可用于浏览器中的聊天机器人。这是一种更有效的方法，由于语料库文件比生成的模型小得多。

训练的第一个机器学习代码感受很好。你刚刚成为一个传奇人物，而且是这个星球上的少数人能够说：“是的，我曾经训练过一次AI，没什么大不了的。”

聊天机器人HTML

如今，咱们将使chatbot表单起做用。而且在 index.html 中添加 onChatSubmit 函数

<html>
<head>
    <title>NLP in a browser</title>
    <script src='./dist/bundle.js'></script>
    <script> const {containerBootstrap, Nlp, LangEn, fs} = window.nlpjs; const setupNLP = async corpus => { const container = containerBootstrap(); container.register('fs', fs); container.use(Nlp); container.use(LangEn); const nlp = container.get('nlp'); nlp.settings.autoSave = false; await nlp.addCorpus(corpus); nlp.train(); return nlp; }; const onChatSubmit = nlp => async event => { event.preventDefault(); const chat = document.getElementById('chat'); const chatInput = document.getElementById('chatInput'); chat.innerHTML = chat.innerHTML + `<p>you: ${chatInput.value}</p>`; const response = await nlp.process('en', chatInput.value); chat.innerHTML = chat.innerHTML + `<p>chatbot: ${response.answer}</p>`; chatInput.value = ''; }; (async () => { const nlp = await setupNLP('https://raw.githubusercontent.com/jesus-seijas-sp/nlpjs-examples/master/01.quickstart/02.filecorpus/corpus-en.json'); const chatForm = document.getElementById('chatbotForm'); chatForm.addEventListener('submit', onChatSubmit(nlp)); })(); </script>
</head>
<body>
<h1>NLP in a browser</h1>
<div id="chat"></div>
<form id="chatbotForm">
    <input type="text" id="chatInput" />
    <input type="submit" id="chatSubmit" value="send" />
</form>
</body>
</html>
复制代码

如今，您可使用新的聊天机器人了：

在这个json 里浏览语料，以了解支持哪些对话主题。如今，您能够在酒吧中向朋友展现并轻松得到他们的钦佩，由于您如今是真正的黑客。

向意图添加Hook

你可能但愿聊天机器人可以使用每种意图调用一些其余代码，或者使用一些 API 调用替换某些意图的答案。让咱们扩展 index.html 到最终版本。

<html>
<head>
    <title>NLP in a browser</title>
    <script src='./dist/bundle.js'></script>
    <script> const {containerBootstrap, Nlp, LangEn, fs} = window.nlpjs; function onIntent(nlp, input) { console.log(input); if (input.intent === 'greetings.hello') { const hours = new Date().getHours(); const output = input; if(hours < 12) { output.answer = 'Good morning!'; } else if(hours < 17) { output.answer = 'Good afternoon!'; } else { output.answer = 'Good evening!'; } return output; } return input; } const setupNLP = async corpus => { const container = containerBootstrap(); container.register('fs', fs); container.use(Nlp); container.use(LangEn); const nlp = container.get('nlp'); nlp.onIntent = onIntent; nlp.settings.autoSave = false; await nlp.addCorpus(corpus); nlp.train(); return nlp; }; const onChatSubmit = nlp => async event => { event.preventDefault(); const chat = document.getElementById('chat'); const chatInput = document.getElementById('chatInput'); chat.innerHTML = chat.innerHTML + `<p>you: ${chatInput.value}</p>`; const response = await nlp.process('en', chatInput.value); chat.innerHTML = chat.innerHTML + `<p>chatbot: ${response.answer}</p>`; chatInput.value = ''; }; (async () => { const nlp = await setupNLP('https://raw.githubusercontent.com/jesus-seijas-sp/nlpjs-examples/master/01.quickstart/02.filecorpus/corpus-en.json'); const chatForm = document.getElementById('chatbotForm'); chatForm.addEventListener('submit', onChatSubmit(nlp)); })(); </script>
</head>
<body>
<h1>NLP in a browser</h1>
<div id="chat"></div>
<form id="chatbotForm">
    <input type="text" id="chatInput" />
    <input type="submit" id="chatSubmit" value="send" />
</form>
</body>
</html>
复制代码

向setupNLP添加了一行：

nlp.onIntent = onIntent;
复制代码

而后建立 onIntent 函数。请注意，onIntent 针对每一个意图，将返回的结果对象显示在控制台中。同时在 greetings.hello 中，经过基于用户当前时间的答案替换其输出，为意图添加逻辑。就我而言，如今是下午：

这不是很棒吗？若是您正准备建立本身的AI创业公司，来击个掌。

已知局限性

请注意，NLP 的浏览器版本不支持某些常见的天然语言处理功能，例如完整库中可用的命名实体或实体提取。

NLP 做为库目前也不支持更复杂的功能。这些是 chatbot 业务流程当前开发的一部分，可是在撰写本文时，该功能仍处于试验阶段。

安全和隐私注意事项

使用此解决方案时，请记住，访问您网站的任何人均可以在浏览器中使用整个语料库及其功能。这也使任何人都可以简单地下载您的语料库，对其进行操做以及以其余方式使用它。确保你的浏览器没有公开任何私人信息。

使用仅浏览器的解决方案具备某些优点，但也消除了一些机会，由于您仍然须要一些后端解决方案，以便可以记录用户与您的聊天机器人在谈论什么。同时，若是您记录整个对话，请考虑隐私问题，尤为是在 GDPR 之类的立法中。

淘系前端-F-x-Team 开通微博啦！（微博登陆后可见）

除文章外还有更多的团队内容等你解锁🔓