一、filter()函数函数
filter(.data, ...),参数很简单,只有data,即要操做的数据对象,其余都是数据操做条件。code
> x<-data.frame(id=1:6,name=c("wang","zhang","li","chen","zhao","song"),shuxue=c(89,85,68,79,96,53),yuwen=c(77,68,86,87,92,63)) > dim(x) #查看数据框行列属性 [1] 6 4 > x id name shuxue yuwen 1 1 wang 89 77 2 2 zhang 85 68 3 3 li 68 86 4 4 chen 79 87 5 5 zhao 96 92 6 6 song 53 63 > x1<-filter(x,name=="zhang") > x1 id name shuxue yuwen 1 2 zhang 85 68 > x2<-filter(x,shuxue>60,yuwen<90) #能够进行多条件筛选,条件能够用逗号隔开,也能够用链接符&或| 进行链接 > x2 id name shuxue yuwen 1 1 wang 89 77 2 2 zhang 85 68 3 3 li 68 86 4 4 chen 79 87
二、arrange函数对象
跟filter()相似,arrange()的参数也很简单,出来data外,余下的是排序条件。排序
> x3<-arrange(x,name) #按照字母的顺序进行排序 > x3 id name shuxue yuwen 1 4 chen 79 87 2 3 li 68 86 3 6 song 53 63 4 1 wang 89 77 5 2 zhang 85 68 6 5 zhao 96 92 > x4<-arrange(x,shuxue,desc(yuwen)) #按照shuxuec正序排序,而后按照yuwen倒序排序 > x4 id name shuxue yuwen 1 6 song 53 63 2 3 li 68 86 3 4 chen 79 87 4 2 zhang 85 68 5 1 wang 89 77 6 5 zhao 96 92
三、select函数it
参数主要在于如何添加条件。配合select()进行使用的函数有:select
starts_with() ends_with() contains() matches() num_range() one_of() everything() > x$shengwu<-c(85,68,78,68,98,96) > x id name shuxue yuwen shengwu 1 1 wang 89 77 85 2 2 zhang 85 68 68 3 3 li 68 86 78 4 4 chen 79 87 68 5 5 zhao 96 92 98 6 6 song 53 63 96 > select(x,name) #选取单列 name 1 wang 2 zhang 3 li 4 chen 5 zhao 6 song > select(x,starts_with("s")) #选取包好以“s”开头的列 shuxue shengwu 1 89 85 2 85 68 3 68 78 4 79 68 5 96 98 6 53 96 > select(x,matches(".e.")) #匹配中间含有“e”的列 yuwen shengwu 1 77 85 2 68 68 3 86 78 4 87 68 5 92 98 6 63 96 > select(x,ends_with("e")) #选取以“e”结尾的列 name shuxue 1 wang 89 2 zhang 85 3 li 68 4 chen 79 5 zhao 96 6 song 53 > select(x,contains("e")) #匹配全部名称中包含“e”的列 name shuxue yuwen shengwu 1 wang 89 77 85 2 zhang 85 68 68 3 li 68 86 78 4 chen 79 87 68 5 zhao 96 92 98 6 song 53 63 96 > select(x,-name) #在名字前面加个“-”,表示出了这一列之外,其余的列都显示 id shuxue yuwen shengwu 1 1 89 77 85 2 2 85 68 68 3 3 68 86 78 4 4 79 87 68 5 5 96 92 98 6 6 53 63 96
四、summarise函数im
> x id name shuxue yuwen 1 1 wang 89 77 2 2 zhang 85 68 3 3 li 68 86 4 4 chen 79 87 5 5 zhao 96 92 6 6 song 53 63 > summarise(x,sum(shuxue)) sum(shuxue) 1 470 > summarise(group_by(x,name),sum(shuxue)) #这里因为每一个name对应的shuxue只有一个参数,因此sum的结果没变化 name `sum(shuxue)` <fctr> <dbl> 1 chen 79 2 li 68 3 song 53 4 wang 89 5 zhang 85 6 zhao 96 > summarise(group_by(x,name),sum(shuxue,yuwen)) #shuxue和yuwen求和后的数据 name `sum(shuxue, yuwen)` <fctr> <dbl> 1 chen 166 2 li 154 3 song 116 4 wang 166 5 zhang 153 6 zhao 188 > arrange(summarise(group_by(x,name),qiuhe=sum(shuxue,yuwen)),desc(qiuhe)) #配合上前面的函数,就能够对求和后的数据进行排序 name qiuhe <fctr> <dbl> 1 zhao 188 2 chen 166 3 wang 166 4 li 154 5 zhang 153 6 song 116 > summarise(x,mean(shuxue),sd(shuxue)) #求均值和方差 mean(shuxue) sd(shuxue) 1 78.33333 15.61623 > summarise(group_by(x,name),a=n()) #配合n()能够对每一个因子的出现次数进行统计 name a <fctr> <int> 1 chen 1 2 li 1 3 song 1 4 wang 1 5 zhang 1 6 zhao 1 > summarise_if(x,is.numeric,mean) #对全部是数值的列求平均值 id shuxue yuwen 1 3.5 78.33333 78.83333 > summarise_at(x,c(3,4),mean) #对特定的列求平均值 shuxue yuwen 1 78.33333 78.83333 > summarise_each(x[c(1,3,4)],funs(mean,sum)) #使用funs,对数据进行多重聚合统计 id_mean shuxue_mean yuwen_mean id_sum shuxue_sum yuwen_sum 1 3.5 78.33333 78.83333 21 470 473
五、between()函数统计
> a<-10:30 > between(a,5,15) #between函数返回结果是逻辑值,即那些数据知足条件,标记为TRUE [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > a[between(a,5,15)] #经过加中括号的形式,把正确结果显示出来 [1] 10 11 12 13 14 15