Reshape
包主要是用来作数据变形的。其中主要的有两个函数melt
和dcast1
。其中melt
主要用于宽变长,而dcast1
主要用于长变宽。melt
和dcast1
是reshape2
包中函数的扩展。
在v1.9.6
起,无需载入reshape2
就能够使用这些函数。只须要载入data.table
便可。若是必须载reshape2
包,请肯定在载入data.table
前载入。 api
假设咱们有数据以下:函数
library(data.table) DT=fread("melt_default.csv") head(DT)
## family_id age_mother dob_child1 dob_child2 dob_child3 ## 1: 1 30 1998-11-26 2000-01-29 NA ## 2: 2 27 1996-06-22 NA NA ## 3: 3 26 2002-07-11 2004-04-05 2007-09-02 ## 4: 4 32 2004-10-10 2009-08-27 2012-07-21 ## 5: 5 29 2000-12-05 2005-02-28 NA
str(DT)
## Classes 'data.table' and 'data.frame': 5 obs. of 5 variables: ## $ family_id : int 1 2 3 4 5 ## $ age_mother: int 30 27 26 32 29 ## $ dob_child1: chr "1998-11-26" "1996-06-22" "2002-07-11" "2004-10-10" ... ## $ dob_child2: chr "2000-01-29" NA "2004-04-05" "2009-08-27" ... ## $ dob_child3: chr NA NA "2007-09-02" "2012-07-21" ... ## - attr(*, ".internal.selfref")=<externalptr>
DT.m1=melt(DT,id.vars = c("family_id","age_mother"),measure.vars = c("dob_child1","dob_child2","dob_child3")) DT.m1
## family_id age_mother variable value ## 1: 1 30 dob_child1 1998-11-26 ## 2: 2 27 dob_child1 1996-06-22 ## 3: 3 26 dob_child1 2002-07-11 ## 4: 4 32 dob_child1 2004-10-10 ## 5: 5 29 dob_child1 2000-12-05 ## 6: 1 30 dob_child2 2000-01-29 ## 7: 2 27 dob_child2 NA ## 8: 3 26 dob_child2 2004-04-05 ## 9: 4 32 dob_child2 2009-08-27 ## 10: 5 29 dob_child2 2005-02-28 ## 11: 1 30 dob_child3 NA ## 12: 2 27 dob_child3 NA ## 13: 3 26 dob_child3 2007-09-02 ## 14: 4 32 dob_child3 2012-07-21 ## 15: 5 29 dob_child3 NA
str(DT.m1)
## Classes 'data.table' and 'data.frame': 15 obs. of 4 variables: ## $ family_id : int 1 2 3 4 5 1 2 3 4 5 ... ## $ age_mother: int 30 27 26 32 29 30 27 26 32 29 ... ## $ variable : Factor w/ 3 levels "dob_child1","dob_child2",..: 1 1 1 1 1 2 2 2 2 2 ... ## $ value : chr "1998-11-26" "1996-06-22" "2002-07-11" "2004-10-10" ... ## - attr(*, ".internal.selfref")=<externalptr>
measure.vars
界定了收缩的列的集合。 variable
和value
在命令中能够对variable
和value
的列名进行更改。若是id.vars
和measure.vars
没有指定,方法会将其中非数值的、证书的和逻辑值的列做为id.vars
。同时会输出警告信息。code
DT.m1=melt(DT,measure.vars = c("dob_child1","dob_child2","dob_child3"),variable.name = "child",value.name = "dob") DT.m1
## family_id age_mother child dob ## 1: 1 30 dob_child1 1998-11-26 ## 2: 2 27 dob_child1 1996-06-22 ## 3: 3 26 dob_child1 2002-07-11 ## 4: 4 32 dob_child1 2004-10-10 ## 5: 5 29 dob_child1 2000-12-05 ## 6: 1 30 dob_child2 2000-01-29 ## 7: 2 27 dob_child2 NA ## 8: 3 26 dob_child2 2004-04-05 ## 9: 4 32 dob_child2 2009-08-27 ## 10: 5 29 dob_child2 2005-02-28 ## 11: 1 30 dob_child3 NA ## 12: 2 27 dob_child3 NA ## 13: 3 26 dob_child3 2007-09-02 ## 14: 4 32 dob_child3 2012-07-21 ## 15: 5 29 dob_child3 NA
dcast
将数据从长边短。ci
dcast(DT.m1,family_id+age_mother~ child,value.var = "dob")
## family_id age_mother dob_child1 dob_child2 dob_child3 ## 1: 1 30 1998-11-26 2000-01-29 NA ## 2: 2 27 1996-06-22 NA NA ## 3: 3 26 2002-07-11 2004-04-05 2007-09-02 ## 4: 4 32 2004-10-10 2009-08-27 2012-07-21 ## 5: 5 29 2000-12-05 2005-02-28 NA
dcast
使用公式界面。 value.var
说明列将会变成宽格式。如何知道每一个家庭小孩子的数量呢?io
dcast(DT.m1,family_id~.,fun.aggregate = function(x)sum(!is.na(x)),value.var = "dob")
## family_id . ## 1: 1 2 ## 2: 2 1 ## 3: 3 3 ## 4: 4 3 ## 5: 5 2
参考文献: Efficient reshaping using data.tablestable