这篇博客主要内容:python
sklearn.feature_extraction
做用:对字典数据进行特征值化api
# 数据 [{'city': '北京','temperature':100} {'city': '上海','temperature':60} {'city': '深圳','temperature':30}]
# 代码 from sklearn.feature_extraction import DictVectorizer def dict_demo(): data = [{'city': '北京','temperature':100}, {'city': '上海','temperature':60}, {'city': '深圳','temperature':30}] # 一、实例化一个转换器类 transfer = DictVectorizer(sparse=False) # 二、调用fit_transform data_new = transfer.fit_transform(data) print("data_new:\n",data_new) # 打印特征名字 print("特征名字:\n",transfer.get_feature_names()) return None
注意DictVectorizer
默认是true,输出为稀疏矩阵,false输出为普通矩阵数组
做用:对文本数据进行特征值化spa
sklearn.feature_extraction.text.CountVectorizer(stop_words=[])code
CountVectorizer.fit_transform(X) X:文本或者包含文本字符串的可迭代对象 返回值:返回sparse矩阵orm
CountVectorizer.inverse_transform(X) X:array数组或者sparse矩阵 返回值:转换以前数据格对象
CountVectorizer.get_feature_names() 返回值:单词列表blog
sklearn.feature_extraction.text.TfidfVectorizerci
# 数据 ["life is short,i like python", "life is too long,i dislike python"]
# 代码 from sklearn.feature_extraction.text import CountVectorizer def count_demo(): data = ["life is short,i like like python", "life is too long,i dislike python"] transfer = CountVectorizer() data_new = transfer.fit_transform(data) print("data_new:\n",data_new.toarray()) print("特征名字:\n",transfer.get_feature_names()) return None
注意代码中的使用了toarray()
,能够不加这个方法,再运行一下看看📑字符串