Python数据科学(三) python与数据科学应用(Ⅲ)

传送门:python

1.使用Python计算文章中的字

speech_text = ''' I love you,Not for what you are,But for what I amWhen I am with you.I love you,Not only for whatYou have made of yourself,But for whatYou are making of me.I love youFor the part of meThat you bring out;I love youFor putting your handInto my heaped-up heartAnd passing overAll the foolish, weak thingsThat you can’t helpDimly seeing there,And for drawing outInto the lightAll the beautiful belongingsThat no one else had lookedQuite far enough to find.I love you because youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;Out of the worksOf my every dayNot a reproachBut a song.I love youBecause you have doneMore than any creedCould have doneTo make me goodAnd more than any fateCould have doneTo make me happy.You have done itWithout a touch,Without a word,Without a sign.You have done itBy being yourself.Perhaps that is whatBeing a friend means,After all. '''

speech = speech_text.split()

dic = {}
for word in speech:
    if word not in dic:
        dic[word]=1
    else:
        dic[word]=dic[word] + 1


dic.items()
复制代码

在使用nltk的时候,发现一直报错,能够使用下边两行命令安装nltkbash

import nltk
nltk.download()
复制代码

会弹出如下窗口,下载nltk.app

正在下载

若是这种方式下载完成了 那就直接跳过下一步post

我下了不少次最后都下载失败了,如今说第二种方法。 直接下载打包好的安装包:下载地址1:云盘密码znx7,下来的包nltk_data.zip 解压到C盘根目录下,这样是最保险的,防止找不到包。下载地址2:云盘密码4cp3ui

感谢【V_can--Python与天然语言处理_第一期_NLTK入门之环境搭建提供的安装包】spa

去除停用词

2.使用第二种方法直接使用python中的第三方库Counter

#代码以下
from collections import Counter
c = Counter(speech)
c. most_common(10)#出现的前十名
print(c. most_common(10))

for sw in stop_words:
    del c[sw]
c.most_common(10)
复制代码

Counter 是实现的 dict 的一个子类,能够用来方便地计数。

  • 附上完整代码
speech_text = ''' I love you, Not for what you are, But for what I amWhen I am with you. I love you, Not only for whatYou have made of yourself, But for whatYou are making of me. I love youFor the part of meThat you bring out; I love youFor putting your handInto my heaped-up heartAnd passing overAll the foolish, weak thingsThat you can’t helpDimly seeing there, And for drawing outInto the lightAll the beautiful belongingsThat no one else had lookedQuite far enough to find. I love you because youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple; Out of the worksOf my every dayNot a reproachBut a song. I love youBecause you have doneMore than any creedCould have doneTo make me goodAnd more than any fateCould have doneTo make me happy. You have done itWithout a touch, Without a word, Without a sign. You have done itBy being yourself. Perhaps that is whatBeing a friend means, After all. '''

#解决大小写的问题
speech = speech_text.lower().split()
print(speech)

dic = {}
for word in  speech:
    if word not in dic:
        dic[word] = 1
    else:
        dic[word] = dic[word] + 1

import operator
swd = sorted(dic.items(),key=operator.itemgetter(1),reverse=True)
print(swd)

#停用词处理
from nltk.corpus import stopwords
stop_words = stopwords.words('English')

for k,v in swd:
    if k not in stop_words:
        print(k,v)


from collections import Counter
c = Counter(speech)
c. most_common(10)#出现的前十名
print(c. most_common(10))

for sw in stop_words:
    del c[sw]
c.most_common(10)
复制代码

经过这两种方法咱们就不难明白为何如今Python 在数据分析、科学计算领域用得愈来愈多,除了语言自己的特色,第三方库也不少很好用。.net

因此还等什么,人生几何,何不Python当歌。 跟我一块学Python吧。3d

相关文章
相关标签/搜索