单页应用SPA作SEO的一种清奇的方案

时间 2019-11-10

原文原文链接

单页应用SPA作SEO的一种清奇的方案

网上有好几种单页应用转seo的方案，有服务端渲染ssr、有预渲染prerender、google抓AJAX、静态化。。。这些方案都各有优劣，开发者能够根据不一样的业务场景和环境决定用哪种方案。本文将介绍另外一种思路比较清奇的SEO方案，这个方案也是有优有劣，就看读者以为适不适合了。javascript

项目分析

个人项目是用react+ts+dva技术栈搭建的单页应用，目前在线上已经有几十个页面，若干个sdk和插件在里面。html

考虑想用服务端渲染来作seo，可是个人项目已经开发了这么多，打包配置、代码分割、语法兼容、摒弃浏览器对象，服务端思想，这么多的点须要考虑，还不如换个框架从新开发呢，因此改形成本太大😱，服务端渲染不适合我这种状况。
预渲染虽然是开发成本最低的，但毕竟是生成一张一张的静态html，而个人seo需求是可以让蜘蛛抓取到个人社区论坛下的每一篇帖子，这样子下来一篇帖子就是一份html,再加上分页，那得多大的量级来存储啊😰，并且网站更新就更麻烦了，这个方案也不太适合。
google.....Emmmm.........................下一个
静态化也是跟预渲染差很少。。。

隆重介绍

之前写过一种单页应用seo的方案，就是本身先在本地用爬虫作预渲染，生成一样目录结构的静态化的html，前端项目服务器判断请求的UA是搜索引擎蜘蛛的话就会转发到我事先静态化过的html页面前端

当时的项目只是一个简单的只有几个页面的企业官网，预渲染没啥问题。java

跟着这个思路，只要判断搜索引擎蜘蛛让蜘蛛看到另外一个有数据的页面不就好了。react

至于页面长什么样，蜘蛛🕷才不会管呢，就像是你找广告商投放广告，广告商不会要求你要怎样的主题什么色调，只要你按照他的尺寸和要求来作，而后给钱给货就完事了🤑。ios

因此能够针对SEO作另外一套网站，没有样式，只有符合seo规范的html标签和对应的数据，不须要在原有项目上改造，开发成本也不会很高，体积小加载速度更快。web

缺点也有，就是须要另外维护一套网站，主网站界面变化不会影响，若是展现数据有变化就须要同步修改seo版的网站。express

代码实现

先建个单独的seo文件夹，不须要动到原有项目，下面是代码结构：
axios

代码实现很是之简单，只要写一个中间件拦截请求，鉴别蜘蛛，返回对应路径的seo页面便可。api

个人前端服务器是用express，能够写个express的中间件, 新建server.js：

// seo/server.js
const routes = require('./routes')
const layout_render = require('./src/layout');

module.exports = (req, res, next) => {
  // 各大搜索引擎蜘蛛UA
  const spiderUA = /Baiduspider|bingbot|Googlebot|360spider|Sogou|Yahoo! Slurp/
  var isSpider = spiderUA.test(req.get('user-agent'))
  // 获取路由表的路径
  var seoPath = Object.keys(routes)
  if (isSpider) {
    for (let i=0,route; route = seoPath[i]; i++) {
      if (new RegExp(route).test(req.path)) {
        routes[route](req).then((result) => {
          // 返回对应的模板结果给蜘蛛
          res.set({'Content-Type': 'text/html','charset': 'utf-8mb4'}).status(200).send(layout_render(result))
        })
        break;
      }
    }
  } else {
    // 未匹配到蜘蛛则继续后面的中间件
    return next()
  }
}

而后在前端的启动服务器里加入这个中间件，记得要放在其余中间件以前

// 前端启动服务器的server文件
var express = require('express')
var app = express()
// seo
app.use(require('seo/server'));
......

app.listen(xxxx)

接下来就是写模板和对应的解析了, 新建一个home文件夹，文件夹下再建一个index.ejs和index.js

<!-- seo/src/home/index.ejs -->
<div>
  <h1>官网首页</h1>
  <p>友情连接：</p>
  <p><a href="https://www.baidu.com/" target="_blank">百度</a></p>
  <p><a href="https://www.gogole.com/" target="_blank">谷歌</a></p>
</div>

index.js用于解析对应的ejs模板

// seo/src/home/index.js
const ejs = require('ejs')
const fs = require('fs')
const path = require('path')
const template = fs.readFileSync(path.resolve(__dirname, './index.ejs'), 'utf8');

// 这里为何会有个async关键字，日后面看就能够知道。
module.exports = async (req) => {
  const result = ejs.render(template)
  return result
}

咱们还能够建多个layout模板来管理head、title和导航栏这些公有的元素

<!-- seo/layout.ejs -->
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta http-equiv="content-type" content="text/html;charset=utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name=”renderer” content=”webkit”>
  <meta content="网站关键字"" name="keywords"/>
  <meta content="网站描述" name="description"/>
  <title>网站标题</title>
</head>
<body>
  <div id="root">
    <ul>
      <li><a href="/">首页</a></li>
      <li><a href="/community">社区</a></li>
    </ul>
    <%- children -%>
  </div>
</body>
</html>

解析layout.ejs，套入内容的layout_render:

// seo/layout.js
const ejs = require('ejs')
const fs = require('fs')
const path = require('path')
const template = fs.readFileSync(path.resolve(__dirname, './layout.ejs'), 'utf8');

const layout_render = (children) => {
  return ejs.render(template, {children: children})
}
module.exports = layout_render

路由表用简单的键值对就能够了，键名用字符串形式的正则来表示路径的匹配规则：

// seo/routes.js
const home_route = require('./src/home/index')

module.exports = {
  '^(/?)$': home_route,
}

那么数据如何作请求并展现到对应的模板内呢？数据请求是异步的，怎样等到请求完成再渲染模板呢？

咱们能够用async/await来实现，如今来作一个社区的帖子列表页面，须要先请求社区下帖子列表数据再把数据渲染到模板，新建一个community文件夹，一样再建一个index.ejs做为帖子列表页面模板：

<!-- seo/src/community/index.ejs -->
<div>
  <h1>帖子列表</h1>
  <ul>
    <% forum_list.map((item) => { %>
    <li><a href="/community/<%= item.id%>" target="_blank"><%= item.title-%></a></li>
    <% })%>
  </ul>
</div>

相关的接口请求及数据操做写在同级的index.js：

// seo/src/community/index.js
const ejs = require('ejs')
const fs = require('fs')
const path = require('path')
const template = fs.readFileSync(path.resolve(__dirname, './index.ejs'), 'utf8');
const axios = require('axios');

module.exports = async (req) => {
  const res = await axios.get('http://xxx.xx/api/community/list')
  const result = ejs.render(template, {forum_list: res.data.list})
  return result
}

再加上对应的路由配置：

// seo/routes.js
const home_route = require('./src/home/index')
const community_route = require('./src/community/index')

module.exports = {
  '^(/?)$': home_route,
  '^/community$': community_route,
}

这样就实现了先取接口数据再作渲染，保证了蜘蛛访问能给到完整的数据和html结构。

继续实现一个帖子详情的页面：

<!-- seo/src/community_detail/index.ejs -->
const community_route = require('./src/community/index')
<div>
  <h1><%= forum_data.title%></h1>
  <p><%= forum_data.content%></p>
  <p>做者：<%= forum_data.user.nickname%></p>
</div>

// seo/src/community_detail/index.js
const ejs = require('ejs')
const fs = require('fs')
const path = require('path')
const template = fs.readFileSync(path.resolve(__dirname, './index.ejs'), 'utf8');
const axios = require('axios');

module.exports = async (req) => {
  // 获取路径里的id   /community/:id
  const forum_id = req.path.split('/')[2]
  const res = await axios.get(`http://xxx.xx/api/community/${forum_id}/details?offset=1&limit=10`)
  const result = ejs.render(template, {forum_data: res.data})
  return result
}

一样加上对应的路由配置：

// seo/routes.js
const home_route = require('./src/home/index')
const community_route = require('./src/community/index')
const community_detail_route = require('./src/community_detail/index')

module.exports = {
  '^(/?)$': home_route,
  '^/community$': community_route,
  '^/community/\\d+$': community_detail_route,
}

这样就实现了一个简单的seo版网站，不须要任何样式，不须要js作弹框之类的后续交互，只要蜘蛛访问网址的第一个请求有它要的数据便可，是否是很是的清奇😝。。。

总结来讲呢，就是若是你的项目处在线上运营阶段而且开发到了必定的集成度了，迫于ssr的改形成本太大，又须要让一些数据(好比每一篇文章帖子)可以被收录，就能够考虑一下个人这个方法🤓。

可是我不保证蜘蛛的防做弊机制，会不会过滤掉我这种跟浏览器正常访问主站差别较大的seo版小网站🤔。目前这个方案还在试验阶段。

了解到有一种黑客攻击手段叫“搜索引擎劫持”，原理也是在网站里植入恶意代码，利用小蜘蛛访问时引导到另外一个网站，从而小蜘蛛爬到的是另外一个网站，浏览器直接访问域名则是正常的，成功的利用了蜘蛛自己只会收集爬到的网站内容并不会验证与浏览器访问时的内容是否一致这一特色。

测试

测试也很简单，写个模拟蜘蛛请求便可，curl、爬虫、postman均可以模拟蜘蛛的UA来测试。或者改一下搜索引擎蜘蛛的的判断条件就能够直接用浏览器访问的呢。

若是有朋友用了我这个方法而且真的有用可以被搜索引擎收录的话，请记得我😎，要是能打赏就更好了哈哈🤑。