R语言-进行中文分词

两种方法进行中文分词:Rwordseg和jiebaRweb

R语言的环境配置:spa

R_Path:orm

C:\Program Files\R\R-3.1.2server

Path: ip

%R_Path%
get

1、用Rwordseg包进行中文分词

(1)进行Java的环境变量配置:it

JAVA_HOME:
io

C:\Program Files\Java\jdk1.8.0_31
test

Path:import

%JAVA_HOME%\bin;%JAVA_HOME%\jre\bin

CLASSPATH:

%JAVA_HOME%\lib\dt.jar;%JAVA_HOME%\lib\tools.jar


(2)下载Rwordseg包到本地硬盘,当前版本的Rwordseg包在https://r-forge.r-project.org/R/?group_id=1054

1 > install.packages("rJava")
2 > 将如下路径添加到Path环境变量中:

       • %JAVA_HOME%\jre\bin
       • %JAVA_HOME%\jre\bin\server
       • %R_Path%\library\rJava\jri

3 > install.packages("下载好的Rwordseg包所在的文件夹地址/Rwordseg_0.2-1.zip", repos=NULL,type="source")
(3)输入命令:

1 > library("rJava")
2 > library("Rwordseg")

3 > words = "环卫工因在寒风中烤火取暖被辞退"

4 > segment.options(isNameRecognition = TRUE) #打开人名识别
5 > segmentCN(words)

运行结果:

[1] "环卫" "工"   "因"   "在"   "寒风" "中"   "烤火" "取暖" "被"   "辞退"

换成words = "个人名字是R语言"

运行结果:[1] "我"    "的"    "名字"  "是"    "R语言"

2、用jiebaR包进行中文分词

(1)输入命令:

1 > install.packages("jiebaR") #安装jiebaR包

2 > library("jiebaRD") #加载jiebaRD包

3 > library("jiebaR")

4 > words = "环卫工因在寒风中烤火取暖被辞退"
5 > test = worker()
6 > test <= words

(2)输出结果:

[1] "环卫工" "因在"   "寒风"   "中"     "烤火"   "取暖"   "被"     "辞退"

换成words = "个人名字是R语言"

运行结果:[1] "我"   "的"   "名字" "是"   "R"    "语言"

更多分享请关注:www.crxy.cn

相关文章
相关标签/搜索