沃森机器人天然语言理解服务的开发教程,学习使用 IBM Bluemix Watson NLU服务 完成天然语言的文本内容分析。app
可能须要手动安装Python SDK,能够经过命令:pip install --upgrade watson-developer-cloud
进行安装与更新。框架
In [1]:工具
from watson_developer_cloud import NaturalLanguageUnderstandingV1 from watson_developer_cloud.natural_language_understanding_v1 import ( Features, EntitiesOptions, KeywordsOptions)
用户名和密码均可根据实际状况在服务凭证信息中找到学习
In [2]:测试
username = '9461ca60-a53a-48d4-83da-89e52eeb2688' password = 'qq4GhRKYyVAM'
In [3]:ui
nlu = NaturalLanguageUnderstandingV1(version='2017-12-01', username=username, password=password)
In [4]:this
features = Features(entities=EntitiesOptions(), keywords=KeywordsOptions())
须要注意的是该服务的中文版本仍在开发状态,因此建议你们用英文进行测试spa
In [5]:3d
texts = [ '''Welcome to the official documentation of Godot Engine, the free and open source community-driven 2D and 3D game engine! If you are new to this documentation, we recommend that you read the introduction page to get an overview of what this documentation has to offer.''', '''Godot Engine is an open source project developed by a community of volunteers. It means that the documentation team can always use your feedback and help to improve the tutorials and class reference. If you do not manage to understand something, or cannot find what you are looking for in the docs, help us make the documentation better by letting us know!''' ]
In [6]:code
nlu.analyze(features=features, text=texts[0])
Out[6]:
{'entities': [{'count': 1, 'relevance': 0.992572, 'text': 'Godot Engine', 'type': 'Company'}, {'count': 1, 'relevance': 0.338173, 'text': 'official', 'type': 'JobTitle'}], 'keywords': [{'relevance': 0.908999, 'text': 'Godot Engine'}, {'relevance': 0.741794, 'text': 'open source'}, {'relevance': 0.702832, 'text': 'introduction page'}, {'relevance': 0.674454, 'text': 'official documentation'}, {'relevance': 0.660084, 'text': 'game engine'}, {'relevance': 0.230546, 'text': 'overview'}], 'language': 'en', 'usage': {'features': 2, 'text_characters': 282, 'text_units': 1}}
咱们能够看到服务为咱们识别出来两个主题,即类型为“Company”的“Godot Engine”;和类型为“JobTitle”的“official”。除此以外还获得了一些关键词,如:Godot Engine,open source,introduction page,official documentation……,以及统计出了字符数量和文本单元数量。
In [7]:
NLUA = lambda text: nlu.analyze(features=features, text=text)
In [8]:
NLUA(texts[1])
Out[8]:
{'entities': [], 'keywords': [{'relevance': 0.973304, 'text': 'open source project'}, {'relevance': 0.838422, 'text': 'Godot Engine'}, {'relevance': 0.624781, 'text': 'class reference'}, {'relevance': 0.60158, 'text': 'documentation team'}, {'relevance': 0.369586, 'text': 'docs'}, {'relevance': 0.32839, 'text': 'feedback'}, {'relevance': 0.28266, 'text': 'community'}, {'relevance': 0.282402, 'text': 'volunteers'}, {'relevance': 0.274328, 'text': 'tutorials'}], 'language': 'en', 'usage': {'features': 2, 'text_characters': 372, 'text_units': 1}}
经过 Python 的 Pandas 数据分析框架将关键字信息保存在CSV文档中,便于使用Excel之类的工具进行浏览。
In [9]:
import pandas as pd
In [10]:
text = ''' Introduction The Blender Game Engine (BGE) is Blender’s tool for real time projects, from architectural visualizations and simulations to games. A word of warning, before you start any big or serious project with the Blender Game Engine, you should note that it is currently not very supported and that there are plans for its retargeting and refactoring that, in the very least, will break compatibility. For further information, you should get in touch with the developers via mailing list or IRC and read the development roadmap. Use Cases and Sample Games Blender has its own built in Game Engine that allows you to create interactive 3D applications or simulations. The major difference between Game Engine and the conventional Blender system is in the rendering process. In the normal Blender engine, images and animations are built off-line – once rendered they cannot be modified. Conversely, the Blender Game Engine renders scenes continuously in real-time, and incorporates facilities for user interaction during the rendering process.'''
In [11]:
response = NLUA(text) response
Out[11]:
{'entities': [{'count': 1, 'relevance': 0.325141, 'text': 'Blender', 'type': 'Company'}, {'count': 1, 'relevance': 0.109868, 'text': 'BGE', 'type': 'Organization'}], 'keywords': [{'relevance': 0.968481, 'text': 'Blender Game Engine'}, {'relevance': 0.672281, 'text': 'normal Blender engine'}, {'relevance': 0.642034, 'text': 'Blender’s tool'}, {'relevance': 0.585455, 'text': 'conventional Blender'}, {'relevance': 0.544693, 'text': 'real time projects'}, {'relevance': 0.516305, 'text': 'Engine renders scenes'}, {'relevance': 0.50586, 'text': 'interactive 3D applications'}, {'relevance': 0.503355, 'text': 'rendering process'}, {'relevance': 0.440065, 'text': 'architectural visualizations'}, {'relevance': 0.42788, 'text': 'development roadmap'}, {'relevance': 0.415892, 'text': 'mailing list'}, {'relevance': 0.415685, 'text': 'major difference'}, {'relevance': 0.411704, 'text': 'Sample Games'}, {'relevance': 0.411643, 'text': 'user interaction'}, {'relevance': 0.358522, 'text': 'simulations'}, {'relevance': 0.348923, 'text': 'retargeting'}, {'relevance': 0.326577, 'text': 'compatibility'}, {'relevance': 0.325289, 'text': 'warning'}, {'relevance': 0.324025, 'text': 'Introduction'}, {'relevance': 0.322093, 'text': 'BGE'}, {'relevance': 0.321985, 'text': 'IRC'}, {'relevance': 0.320995, 'text': 'plans'}, {'relevance': 0.320874, 'text': 'touch'}, {'relevance': 0.320867, 'text': 'information'}, {'relevance': 0.320457, 'text': 'word'}, {'relevance': 0.319156, 'text': 'developers'}, {'relevance': 0.319064, 'text': 'Cases'}], 'language': 'en', 'usage': {'features': 2, 'text_characters': 1050, 'text_units': 1}}
In [12]:
keywords = pd.DataFrame(response['keywords']) keywords
Out[12]:
relevance | text | |
---|---|---|
0 | 0.968481 | Blender Game Engine |
1 | 0.672281 | normal Blender engine |
2 | 0.642034 | Blender’s tool |
3 | 0.585455 | conventional Blender |
4 | 0.544693 | real time projects |
5 | 0.516305 | Engine renders scenes |
6 | 0.505860 | interactive 3D applications |
7 | 0.503355 | rendering process |
8 | 0.440065 | architectural visualizations |
9 | 0.427880 | development roadmap |
10 | 0.415892 | mailing list |
11 | 0.415685 | major difference |
12 | 0.411704 | Sample Games |
13 | 0.411643 | user interaction |
14 | 0.358522 | simulations |
15 | 0.348923 | retargeting |
16 | 0.326577 | compatibility |
17 | 0.325289 | warning |
18 | 0.324025 | Introduction |
19 | 0.322093 | BGE |
20 | 0.321985 | IRC |
21 | 0.320995 | plans |
22 | 0.320874 | touch |
23 | 0.320867 | information |
24 | 0.320457 | word |
25 | 0.319156 | developers |
26 | 0.319064 | Cases |
In [13]:
keywords.to_csv('keyowrds.csv')