node 实现广场舞视频下载与裁剪，帮我妈下载广场舞视频并剪辑

时间 2019-11-08

标签 node 实现广场舞视频下载裁剪我妈剪辑繁體版

原文原文链接

前言

我妈是广场舞大妈，常常让我下载广场舞视频，而且要裁剪出背面演示部分或者分解动做部分，那么问题就来了。html

🤔我想，既然我要常常作这件事，可否写个 node 来简化一下呢？node

功能规划

我妈平时主要是用 51广场舞和糖豆广场舞这两个网站，并且糖豆广场舞还有微信小程序。git

因此我这个工具必须可以下载这两个网站的视频，其次还要实现裁剪功能，设想下这个使用场景github

能够在命令行输入某个 51广场舞或糖豆广场舞的连接就能实现下载，而且在下载时命令行显示下载中loading
视频下载完成后，在命令行提示输入开始时间、结束时间，输入完成后开始剪辑视频

好，需求明确了，开始分析正则表达式

分析下 51广场舞

随便打开一个 51广场舞的视频，发现页面就有提供下载按钮，咱们用开发者工具看一下，能够看到按钮的 href 就是视频请求 URL，这就好办了，这里要注意下，这个 URL 不是咱们在命令行输入的那个，咱们输入的是顶部连接栏的 URL ，这个是咱们爬取的目标！npm

分析下糖豆广场舞

随便打开一个糖豆广场舞的视频，打开开发者工具，发现这里用的是 HTML5 的 video，这个属性的 src 就是视频的请求URL小程序

至此，一切看起来很是顺利，咱们很容易就找到了视频的 URL ，事实上在作糖豆的下载时还卡了一下，具体请往下看

实现

新建一个文件夹，而后 npm 或 yarn 初始化一下微信小程序

mkdir dance-video-downloader
cd dance-video-downloader

yarn init
复制代码

新建 index.js，咱们就在这个文件写代码api

touch index.js
复制代码

把用到的模块安装一下，具体文章下面有解说bash

yarn add superagent cheerio ora inquirer fluent-ffmpeg
复制代码

爬取视频 URL

这一步咱们使用到两个爬虫的神器，superagent 和 cheerio ，和一个命令行 loading 工具 ora

superagent 实际上是一个 http 工具，可用于请求页面内容，cheerio 至关于 node 中的 jq

广场舞地址咱们选择在命令行输入，好比 node index http://www.51gcw.com/v/26271.html，这样咱们能够经过 process.argv[2] 获取到这个 URL

咱们先请求到网页内容，而后经过 cheerio 操做 dom 的方式获取到视频的 URL，具体实现代码以下

const superagent = require('superagent')
const cheerio = require('cheerio')
const ora = require('ora')

function run() {
    const scraping = ora('正在抓取网页...\n').start()

    superagent
      .get(process.argv[2])
      .end((err, res) => {
        if (err) {
          return console.log(err)
        }
        scraping.succeed('已成功抓取到网页\n')

        const downloadLink = getDownloadLink(res.text)
        console.log(downloadLink)
      })
  },
  
  function is51Gcw(url) {
    return url.indexOf('51gcw') > -1
  },

  function isTangDou(url) {
    return url.indexOf('tangdou') > -1
  },

  function getDownloadLink(html) {
    const $ = cheerio.load(html)
    let downloadLink
    if (this.is51Gcw(process.argv[2])) {
      downloadLink = $('.play_xz_mp4 a').eq(1).attr('href')
    } else if (process.argv[2]) {
      downloadLink = $('video').attr('src')
    }
    return downloadLink
  },

复制代码

测试一下，首先是 51广场舞的

node index http://www.51gcw.com/v/26271.html 能够看到视频的 URL 打印出来了

再试一下糖豆广场舞的，结果却打印出了 undefined

为何会这样呢？咱们打印获取到的网页分析下

superagent
      .get(process.argv[2])
      .end((err, res) => {
        if (err) {
          return console.log(err)
        }
        scraping.succeed('已成功抓取到网页\n')

        // const downloadLink = getDownloadLink(res.text)
        console.log(res.text)
      })
  },
复制代码

结果发现，糖豆广场舞的视频是使用插件作的，也即，一开始时，页面并无 video 这个标签，因此 $('video').attr('src') 一定是获取不到的。

仔细看看这段 HTML内容，发现这个地址就藏在某个对象里，而这段内容其实也就是字符串，因此我决定使用正则表达式来取到这个 URL

改写下获取 URL 的方法

function getDownloadLink(html) {
    const $ = cheerio.load(html)
    let downloadLink
    if (this.is51Gcw(this.url)) {
      downloadLink = $('.play_xz_mp4 a').eq(1).attr('href')
    } else if (this.isTangDou(this.url)) {
      const match = /video:\s?'(https?\:\/\/\S+)'/.exec(html)
      downloadLink = match && match[1]
    }
    return downloadLink
},
复制代码

ok，如今能够取到 URL 了

下载视频

superagent 其实就是一个 http 工具，因此直接用它下载便可

咱们在取到 URL 后，传到 downloadVideo 进行下载，代码以下

const fs = require('fs')
const DOWNLOAD_PATH = 'gcw.mp4'
function downloadVideo(downloadLink) {
    console.log(`${downloadLink}\n`)
    if (!downloadLink) {
      console.log('获取下载连接失败')
      return
    }
    const downloading = ora('正在下载视频...\n').start()

    const file = fs.createWriteStream(DOWNLOAD_PATH)
    file.on('close', () => {
      downloading.succeed('已成功下载视频\n')

      // this.cutVideo()
    })

    superagent
      .get(downloadLink)
      .pipe(file)
}
复制代码

测试一下，成功下载到视频

裁剪视频

视频下载完成后，咱们要实现裁剪视频，这里用到两个工具，

一个是 Inquirer 用于命令行交互，提问开始时间和结束时间

而裁剪视频我使用的是 node-fluent-ffmpeg，这个实际上是用 node 调用 ffmpeg，因此电脑要安装有 ffmpeg，这也是我平时经常使用的，安利下，功能很是强大，可使用命令进行视频转换格式，图片转视频，切割视频等等，程序猿就应该用这种😎

查阅下 node-fluent-ffmpeg 文档，发现它只提供了 setStartTime()，没有 setEndTime()，只能用 setDuration() 传秒数来设置你要裁剪的时长（秒）

可是我总不能输入开始时间，而后再计算出到结束时间的秒数，再输入这个秒数吧，因此这里我仍是让用户输入结束时间，我在代码用 ffprobe 获取到视频总长度，计算出开始到结束的秒数，这里用到了两个时间转换的工具方法

咱们在下载完成后，调用 cutVideo 方法，以下

const ffmpeg = require('fluent-ffmpeg')
const inquirer = require('inquirer');

/**
* HH:mm:ss 转换成秒数
* @param {string} hms 时间，格式为HH:mm:ss
*/
function hmsToSeconds(hms) {
    const hmsArr = hms.split(':')
    return (+hmsArr[0]) * 60 * 60 + (+hmsArr[1]) * 60 + (+hmsArr[2])
},

/**
* 秒数转换成 HH:mm:ss
* @param {number}} seconds 秒数
*/
function secondsToHms(seconds) {
    const date = new Date(null)
    date.setSeconds(seconds)
    return date.toISOString().substr(11, 8)
}
  
const CUT_RESULT_PATH = 'cut_gcw.mp4'
function cutVideo() {
    inquirer.prompt([
      {
        type: 'confirm',
        name: 'needCut',
        message: '是否须要裁剪？',
        default: true
      },
      {
        type: 'input',
        name: 'startTime',
        message: '请输入开始时间, 默认为 00:00:00 (HH:mm:ss)',
        default: '00:00:00',
        when: ({ needCut }) => needCut
      },
      {
        type: 'input',
        name: 'endTime',
        message: '请输入结束时间, 默认为视频结束时间 (HH:mm:ss)',
        when: ({ needCut }) => needCut
      }
    ]).then(({ needCut, startTime, endTime }) => {
      if (!needCut) {
        process.exit()
      }

      ffmpeg
        .ffprobe(DOWNLOAD_PATH, (err, metadata) => {
          const videoDuration = metadata.format.duration
          endTime = endTime || utils.secondsToHms(videoDuration) // 设置默认时间为视频结束时间
          const startSecond = utils.hmsToSeconds(startTime)
          const endSecond = utils.hmsToSeconds(endTime)
          const cutDuration = (videoDuration - startSecond) - (videoDuration - endSecond)

          console.log(`\n开始时间：${startTime}`)
          console.log(`结束时间：${endTime}`)
          console.log(`开始时间(s)：${startSecond}`)
          console.log(`结束时间(s)：${endSecond}`)
          console.log(`裁剪后时长(s)：${cutDuration}\n`)

          const cutting = ora('正在裁剪视频...\n').start()
          ffmpeg(DOWNLOAD_PATH)
            .setStartTime(startTime)
            .setDuration(cutDuration)
            .saveToFile(CUT_RESULT_PATH)
            .on('end', function () {
              cutting.succeed(`已成功裁剪视频，输出为 ${CUT_RESULT_PATH} `)
            })
        })

    })
  }
复制代码

开发完成

至此，开发完成了，咱们能够用单体模式封装一下，使得代码优雅一点😂，完整的代码以下，也可在我 github 上查看

const fs = require('fs')
const superagent = require('superagent')
const cheerio = require('cheerio')
const ora = require('ora')
const inquirer = require('inquirer');
const ffmpeg = require('fluent-ffmpeg')

const utils = {
  /**
   * HH:mm:ss 转换成秒数
   * @param {string} hms 时间，格式为HH:mm:ss
   */
  hmsToSeconds(hms) {
    const hmsArr = hms.split(':')

    return (+hmsArr[0]) * 60 * 60 + (+hmsArr[1]) * 60 + (+hmsArr[2])
  },

  /**
   * 秒数转换成 HH:mm:ss
   * @param {number}} seconds 秒数
   */
  secondsToHms(seconds) {
    const date = new Date(null)
    date.setSeconds(seconds)
    return date.toISOString().substr(11, 8)
  }
}

const downloader = {
  url: process.argv[2],
  VIDEO_URL_REG: /video:\s?'(https?\:\/\/\S+)'/,
  DOWNLOAD_PATH: 'gcw.mp4',
  CUT_RESULT_PATH: 'gcw_cut.mp4',

  run() {
    if (!this.url) {
      console.log('请输入 51广场舞 或 糖豆广场舞 地址')
      return
    }

    const scraping = ora('正在抓取网页...\n').start()

    superagent
      .get(this.url)
      .end((err, res) => {
        if (err) {
          return console.log(err)
        }
        scraping.succeed('已成功抓取到网页\n')

        const downloadLink = this.getDownloadLink(res.text)
        this.downloadVideo(downloadLink)
      })
  },

  is51Gcw(url) {
    return url.indexOf('51gcw') > -1
  },

  isTangDou(url) {
    return url.indexOf('tangdou') > -1
  },

  getDownloadLink(html) {
    const $ = cheerio.load(html)
    let downloadLink
    if (this.is51Gcw(this.url)) {
      downloadLink = $('.play_xz_mp4 a').eq(1).attr('href')
    } else if (this.isTangDou(this.url)) {
      const match = this.VIDEO_URL_REG.exec(html)
      downloadLink = match && match[1]
    }
    return downloadLink
  },

  downloadVideo(downloadLink) {
    console.log(`${downloadLink}\n`)
    if (!downloadLink) {
      console.log('获取下载连接失败')
      return
    }
    const downloading = ora('正在下载视频...\n').start()

    const file = fs.createWriteStream(this.DOWNLOAD_PATH)
    file.on('close', () => {
      downloading.succeed('已成功下载视频\n')

      this.cutVideo()
    })

    superagent
      .get(downloadLink)
      .pipe(file)
  },

  cutVideo() {
    inquirer.prompt([
      {
        type: 'confirm',
        name: 'needCut',
        message: '是否须要裁剪？',
        default: true
      },
      {
        type: 'input',
        name: 'startTime',
        message: '请输入开始时间, 默认为 00:00:00 (HH:mm:ss)',
        default: '00:00:00',
        when: ({ needCut }) => needCut
      },
      {
        type: 'input',
        name: 'endTime',
        message: '请输入结束时间, 默认为视频结束时间 (HH:mm:ss)',
        when: ({ needCut }) => needCut
      }
    ]).then(({ needCut, startTime, endTime }) => {
      if (!needCut) {
        process.exit()
      }

      ffmpeg
        .ffprobe(this.DOWNLOAD_PATH, (err, metadata) => {
          const videoDuration = metadata.format.duration
          endTime = endTime || utils.secondsToHms(videoDuration)
          const startSecond = utils.hmsToSeconds(startTime)
          const endSecond = utils.hmsToSeconds(endTime)
          const cutDuration = (videoDuration - startSecond) - (videoDuration - endSecond)

          console.log(`\n开始时间：${startTime}`)
          console.log(`结束时间：${endTime}`)
          console.log(`开始时间(s)：${startSecond}`)
          console.log(`结束时间(s)：${endSecond}`)
          console.log(`裁剪后时长(s)：${cutDuration}\n`)

          const cutting = ora('正在裁剪视频...\n').start()
          ffmpeg(this.DOWNLOAD_PATH)
            .setStartTime(startTime)
            .setDuration(cutDuration)
            .saveToFile(this.CUT_RESULT_PATH)
            .on('end', () => {
              cutting.succeed(`已成功裁剪视频，输出为 ${this.CUT_RESULT_PATH} `)
            })
        })

    })
  }
}

downloader.run()
复制代码

试用一下

收到任务

杨丽萍广场舞醉人的花香

只要背面演示部分

安排

找到这个广场舞：www.51gcw.com/v/35697.htm… 而后输入命令

在等待下载的过程当中，去看看背面演示的开始时间和结束时间，下完后输入

而后等待剪切完成！

比起之前效率提高了很多！🎉