itchat我的练习语音与文本图灵测试例程

时间 2019-12-08

标签 itchat 我的练习语音文本图灵测试例程繁體版

原文原文链接

背景介绍

itchat是一个开源的微信我的号接口，使用python调用微信从未如此简单。python

使用不到三十行的代码，你就能够完成一个可以处理全部信息的微信机器人。git

官方文档参考https://itchat.readthedocs.io/zh/latest/github

最近要作一个自动应答机器人，得到用户消息GUI+语义分析+机器学习给出答案。算法

准备工做

须要安装ffmpeg(百度搜索官网，下载windows版解压后把bin目录添加到系统变量的path中)
pip安装 pydub，SpeechRecognitionjson

pip install pydub
pip install SpeechRecognition

绑定消息

GUI这部分使用微信的itchat接口，安装和新手教程能够本身参考官方文档。windows

绑定语音消息回复的方式为：api

@itchat.msg_register(RECORDING)
def tuling_reply(msg):

其中用的是RECORDING是由于以前代码最开始有from itchat.content import *，不然就要使用itchat.content.RECORDINGbash

关于@修饰符的做用，网上百度就有，说下本身的思考：服务器

    @de
    def func1:
    ----- 等价于 ------
    func1 = de( func1 )微信

Python解释器读到函数修饰符“@”的时候，后面步骤会是这样了：

1. 去调用de函数，de函数的入口参数就是那个叫“func1”的函数；

2. de函数被执行，入口参数的（也就是func1函数）会被调用（执行）；

换言之，修饰符带的那个函数的入口参数，就是下面的那个整个的函数。

参考https://blog.csdn.net/972301/article/details/59537712和 https://blog.csdn.net/fwenzhou/article/details/8733857

因此咱们使用@的时候，itchat.msg_register这个函数就被执行了，咱们定义的tuling_reply做为参数传了进去，因此才会读取到消息就用这个函数处理消息

语音识别

因为微信保存的语音消息都是mp3格式，看了一圈发现只有腾讯语音识别支持mp3，以前尝试过腾讯一句话识别语音API，可是官方没有最新的例程，而且竟然不一样部分用的是不一样版本的文档说明，致使我鉴权一直失败。到后来仔细研读了下，本身写了代码，鉴权应该是经过了，可是返回的消息是x‘\98'这样的一个中文字符，而且解码会失败，这才发现多是由于腾讯的只支持中文，虽然我在这个随笔的例子是中文语音识别，但我实际项目要作的是英文语音识别。不过在这中间也学到了一些东西，好比加密算法的使用，还有python3的二进制和字符串消息的转换关系。

 1 import binascii
 2 import hashlib
 3 import hmac
 4 import urllib.parse
 5 import urllib.request
 6 import time
 7 import random
 8 import base64
 9 
10 def asr(msg):
11     msg['Text'](msg['FileName'])#保存mp3语音
12     timeData = str(int(time.time())) # 时间戳
13     nonceData = int(random.random()*10000) # Nonce，官网给的信息：随机正整数，与 Timestamp 联合起来， 用于防止重放攻击
14     with open(msg['FileName'], 'rb') as f:
15         voiceData = f.read()#读取mp3语音，得到byte数据，格式是b'\x..'
16     os.remove(msg['FileName'])#删除mp3语音
17     DataLenData = len(voiceData)#读取未base64编码以前的文件长度
18     tmp = int(timeData)#time stamp
19     signDictData = {#须要注意的是字典的key值要按照ascii码升序排序，并不必定是字典序，可使用sorted(signDictData.keys())来查看ascii码排序结果
20         'Action' : actionData,
21         'Data': base64.b64encode(voiceData).decode('utf8'),#base64编码，编码后是二进制，再用decode解码
22         # 'Data': voiceData,
23         'DataLen': DataLenData,
24         'EngSerViceType': EngSerViceTypeData,
25         'Nonce' : nonceData,
26         'ProjectId':0,
27         'Region': 'ap-shanghai',
28         'SecretId' : secretId,
29         # 'SignatureMethod': 'HmacSHA256',#加密算法可选，不指定这个参数默认是HmacSHA1加密
30         'SourceType': SourceTypeData,
31         'SubServiceType': SubServiceTypeData,
32         'Timestamp' : tmp,
33         'UsrAudioKey': UsrAudioKeyData,
34         'Version': versionData,
35         'VoiceFormat': VoiceFormatData
36     }
37     #   请求方法 + 请求主机 +请求路径 + ? + 请求字符串
38     requestStr = "%s%s%s%s%s"%(requestMethod,uriData,"/","?",dictToStr(signDictData))
39     # signData = urllib.parse.quote(sign(secretKey,requestStr,'HmacSHA1'))
40     #生成签名字符的时候必定是使用的没有通过urlencode编码的requestStr字符串，下面的加了encode的就是把字符串变成byte，sha1是算法，decode是把二进制解码为字符串。digest()是把hmac.new()的结果解析成字符串，而后通过base64编码为byte，再解码为字符串
41     signData = binascii.b2a_base64(hmac.new(secretKey.encode('utf-8'), requestStr.encode('utf-8'), hashlib.sha1).digest())[:-1].decode()
42     # 上述操做是实现签名，下面即进行请求
43     # 先创建请求参数, 此处参数只在签名时多了一个Signature
44     actionArgs = {
45         'Action' : actionData,
46         'Data': base64.b64encode(voiceData).decode('utf8'),
47         # 'Data': voiceData,
48         'DataLen': DataLenData,
49         'EngSerViceType': EngSerViceTypeData,
50         'Nonce' : nonceData,
51         'ProjectId':0,
52         'Region': 'ap-shanghai',
53         'SecretId' : secretId,
54         'SourceType': SourceTypeData,
55         'SubServiceType': SubServiceTypeData,
56         'Timestamp' : tmp,
57         'UsrAudioKey': UsrAudioKeyData,
58         'Version': versionData,
59         'VoiceFormat': VoiceFormatData,
60         "Signature": signData
61     }
62     # 根据uri构建请求的url
63     requestUrl = "https://%s/?"%(uriData)
64     # 将请求的url和参数进行拼接，使用urlencode会修改掉参数中的/和=等符号的表示方式
65     requestUrlWithArgs = requestUrl + urllib.parse.urlencode(actionArgs)
66 
67     # actionArgs = signDictData #这是深复制，两个字典就是一个字典
68     # actionArgs["Signature"] = signData
69 
70     # # 根据uri构建请求的url
71     # requestUrl = "https://%s/?"%(uriData)
72     # # 将请求的url和参数进行拼接
73     # requestUrlWithArgs = requestUrl + dictToStr(actionArgs)
74 
75     # 得到response
76     responseData = urllib.request.urlopen(requestUrlWithArgs).read().decode("utf-8")# 根据uri构建
77     # return json.loads(responseData)["Response"]["Error"]["Message"] #处理错误消息
78     return json.loads(responseData)["Response"]["Result"]#处理正确消息

读取语音文件和腾讯API语音识别

后来一直在找能不能用别的语音api，因为百度的参考文档最多，我在其中就发现你们为了可以把音频发到百度语音api上，就使用了pydub对原音频文件进行了转码，这样咱们就能够发送wav格式的语音，因为原本是想识别英文呢语音的，因此我仍是尝试外国公司的api。

尝试了微软语音识别，7天免费的那个，官方文档对于REST接口的参考太少了，而且都不是python的，这时候我在github上发现了一个SpeechRecognition项目，原来觉得是只有谷歌语音识别的接口，尝试了一下结果果真被墙了，用了代理以后仍是没法访问，而后我就看了github主页的Transcribe an audio file,在里面找到了不止一个接口，其中就有Microsoft Bing Voice Recognition的例程，调用很是简单，只须要语音文件和密钥，而且支持语音文件的格式转码，自动给你转成对应必应api的语音参数格式，各位能够本身进入r.recognize_bing()函数定义，在里面详细描述了如何使用必应语音服务，在这里把原话复制下来供参考：

"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Microsoft Bing Speech API.

The Microsoft Bing Speech API key is specified by ``key``. Unfortunately, these are not available without `signing up for an account <https://azure.microsoft.com/en-ca/pricing/details/cognitive-services/speech-api/>`__ with Microsoft Azure.

To get the API key, go to the `Microsoft Azure Portal Resources <https://portal.azure.com/>`__ page, go to "All Resources" > "Add" > "See All" > Search "Bing Speech API > "Create", and fill in the form to make a "Bing Speech API" resource. On the resulting page (which is also accessible from the "All Resources" page in the Azure Portal), go to the "Show Access Keys" page, which will have two API keys, either of which can be used for the `key` parameter. Microsoft Bing Speech API keys are 32-character lowercase hexadecimal strings.

The recognition language is determined by ``language``, a BCP-47 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language values can be found in the `API documentation <https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/bingvoicerecognition#recognition-language>`__ under "Interactive and dictation mode".

Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/bingvoicerecognition#sample-responses>`__ as a JSON dictionary.

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.
"""

Bing语音识别使用说明

因此咱们只须要得到正确的密钥，调用这个函数就能够啦，要注意的是中文语音识别须要在传入参数中设置language="zh-CN"

须要注意的是微软一元试用云服务的活动不支持必应语音识别这个模块，须要访问全球标准的网站才行，试用免费帐户须要VISA或者master信用卡，也可使用具备office服务的公司帐户登陆注册，就不须要信用卡信息了。

代码

全代码以下：

# -*- coding: UTF-8 -*-
import requests
import itchat
import json
from itchat.content import *
import os
import speech_recognition as sr
from pydub import AudioSegment

def get_response_tuling(msg):
    # 这里咱们就像在“3. 实现最简单的与图灵机器人的交互”中作的同样
    # 构造了要发送给服务器的数据
    apiUrl = 'http://www.tuling123.com/openapi/api'
    data = {
        'key'    : '8edce3ce905a4c1dbb965e6b35c3834d',
        'info'   : msg,
        'userid' : 'wechat-robot',
    }
    try:
        r = requests.post(apiUrl, data=data).json()
        # 字典的get方法在字典没有'text'值的时候会返回None而不会抛出异常
        return r.get('text')
    # 为了防止服务器没有正常响应致使程序异常退出，这里用try-except捕获了异常
    # 若是服务器没能正常交互（返回非json或没法链接），那么就会进入下面的return
    except:
        # 将会返回一个None
        return

def asr(msg):
    #语音消息识别转文字输出
    msg['Text'](msg['FileName'])
    song = AudioSegment.from_mp3(msg['FileName'])
    song.export("tmp.wav", format="wav")
    r = sr.Recognizer()
    with sr.AudioFile('tmp.wav') as source:
        audio = r.record(source) # read the entire audio file
    os.remove('tmp.wav')
    os.remove(msg['FileName'])
    # recognize speech using Microsoft Bing Voice Recognition
    BING_KEY = "======修改为你本身的密钥======="  # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings
    try:
        text = r.recognize_bing(audio, key=BING_KEY,language="zh-CN")
        print("Microsoft Bing Voice Recognition thinks you said " + text)
        return text
    except sr.UnknownValueError:
        print("Microsoft Bing Voice Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))

@itchat.msg_register(TEXT)#由于以前把itchat.content所有import了，里面有TEXT变量
def tuling_reply_text(msg):
    # 注册文字消息获取后的处理
    # 为了保证在图灵Key出现问题的时候仍旧能够回复，这里设置一个默认回复
    defaultReply = 'I received a: ' + msg['Text']
    return get_response_tuling(msg['Text']) or defaultReply

@itchat.msg_register(RECORDING)
def tuling_reply(msg):
    # 注册语音消息获取后的处理
    # 为了保证在图灵Key出现问题的时候仍旧能够回复，这里设置一个默认回复
    defaultReply = 'I received a: ' + msg['Type']

    # 若是图灵Key出现问题，那么reply将会是None
    asrMessage = asr(msg)
    return get_response_tuling(asrMessage) or defaultReply

# 为了让实验过程更加方便（修改程序不用屡次扫码），咱们使用热启动hotReload=True
itchat.auto_login(hotReload=True)
itchat.run()

itchat我的练习 语音与文本图灵测试例程

背景介绍

准备工做

绑定消息

语音识别

代码

itchat我的练习语音与文本图灵测试例程