purrr 0.2.0

时间 2019-11-09

标签 purrr 0.2.0 繁體版

原文原文链接

purrr 0.2.0

Hadley Wickham

2016-01-06

Categories: Packages tidyversegit

原文地址github

我很高兴的发布了purrr 0.2.0。Purrr填补了R的函数式编程工具中的缺失部分,让你的函数更加纯粹、类型稳定。编程

我仍然在研究purrr应该作什么，以及它如何与基础R、dplyr, tidyr的现有功能进行比较。一个主要的观点影响了当前的版本那就是：为编程设计的函数应该是类型稳定的。类型稳定性是一个来自Julia的概念，并引发了个人注意。尽管R和Julia中的函数能够返回不一样类型的输出，但总的来讲，您应该努力让函数老是返回相同类型数据结构。这使得函数对变化的输入更加稳健，而且使得它们合理。(可是并非每一个函数都能是类型稳定的)安全

Purrr 0.2.0 为map，flattens和try()添加了类型稳定的替代方法，以下所述。还有不少其余的小改进，错误修复和一些弃用。请参阅发布说明以获取完整的更改列表。数据结构

类型稳定的映射

map是一个让函数做用在每一个向量上的函数。在基础R中的map函数有*apply族：lapply(),sapply(),vapply等。lapply()是一个类型稳定的函数：不管输入什么，都将返回一个列表。sapply不是一个类型稳定函数，它会根据输入返回不一样的类型的输出。下面的代码会向你展现一个简单的sapply的例子，它会根据他的输入返回一个向量，矩阵或者列表。app

df <- data.frame(
  a = 1L,
  b = 1.5,
  y = Sys.time(),
  z = ordered(1)
)

df[1:4] %>% sapply(class) %>% str()
#> List of 4
#>  $ a: chr "integer"
#>  $ b: chr "numeric"
#>  $ y: chr [1:2] "POSIXct" "POSIXt"
#>  $ z: chr [1:2] "ordered" "factor"
df[1:2] %>% sapply(class) %>% str()
#>  Named chr [1:2] "integer" "numeric"
#>  - attr(*, "names")= chr [1:2] "a" "b"
df[3:4] %>% sapply(class) %>% str()
#>  chr [1:2, 1:2] "POSIXct" "POSIXt" "ordered" "factor"
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : chr [1:2] "y" "z"

这种行为使得sapply()适合于交互式使用，由于它一般会正确地猜想并给出一个有用的数据结构。
但在包或生产代码中使用它是不合适的，由于若是输入不是你所指望的，它将不会失败，而是会返回一个意外的数据结构。这一般会在整个过程当中致使进一步的错误，因此你会收到使人困惑的错误消息，而且很难找出根本缘由。函数式编程

基础R有一个sapply()的类型稳定版本叫作vapply()。它须要一个额外的参数来决定输出结果。purrr则是采用不一样的方法，purrr有多个函数而不是一个函数作全部的事情，每一个函数对应一种常见的输出类型： map_lgl(), map_int(), map_dbl(), map_chr(), and map_df()。函数

int，表明integer
dbl，表明double
chr，表明character向量或字符串。
dttm，表明日期+时间(a date + a time)
lgl，表明逻辑判断TRUE或者FALSE
fctr，表明因子类型factor
date，表明日期dates.

这些要么产生指定类型的输出要么抛出一个错误。这迫使你当即处理这个问题：工具

df[1:4] %>% map_chr(class)
#> Error: Result 3 is not a length 1 atomic vector
df[1:4] %>% map_chr(~ paste(class(.), collapse = "/"))
#>                a                b                y                z
#>        "integer"        "numeric" "POSIXct/POSIXt" "ordered/factor"

map()的其余变体具备类似的后缀。例如，map2()容许你并行地迭代两个向量：atom

x <- list(1, 3, 5)
y <- list(2, 4, 6)
map2(x, y, c)
#> [[1]]
#> [1] 1 2
#>
#> [[2]]
#> [1] 3 4
#>
#> [[3]]
#> [1] 5 6

map2()老是返回一个列表。若是要将相应的值相加，并将结果存储为double类型的向量，你可使用 map2_dbl()：

map2_dbl(x, y, `+`)
#> [1]  3  7 11

另外一个map变体是invoke_map()，它接受函数列表和参数列表。它也有类型稳定的后缀：

#IQR为四分位距，mad为中位数绝对误差
spread <- list(sd = sd, iqr = IQR, mad = mad)
x <- rnorm(100)

invoke_map_dbl(spread, x = x)
#>        sd       iqr       mad
#> 0.9121309 1.2515807 0.9774154

Type-stable flatten

当类型稳定性很重要时，另外一种状况是将嵌套列表展平为更简单的数据结构。基础R有unlist函数，但它是危险的，由于它老是成功的。做为替代，purrr提供了flatten_lgl(), flatten_int(), flatten_dbl(), 和 flatten_chr():

x <- list(1L, 2:3, 4L)
x %>% str()
#> List of 3
#>  $ : int 1
#>  $ : int [1:2] 2 3
#>  $ : int 4
x %>% flatten() %>% str()
#> List of 4
#>  $ : int 1
#>  $ : int 2
#>  $ : int 3
#>  $ : int 4
x %>% flatten_int() %>% str()
#>  int [1:4] 1 2 3 4

Type-stable `try()`

另外一个在基础R中的非类型稳定函数是 try。try()确保表达式老是成功，返回原始值或错误消息：

str(try(log(10)))
#>  num 2.3
str(try(log("a"), silent = TRUE))
#> Class 'try-error'  atomic [1:1] Error in log("a") : non-numeric argument to mathematical function
#>
#>   ..- attr(*, "condition")=List of 2
#>   .. ..$ message: chr "non-numeric argument to mathematical function"
#>   .. ..$ call   : language log("a")
#>   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"

safely()是一个类型稳定版本的try()。它老是返回2个元素：结果与错误，有一个总为NULL(有结果则不报错，报错则没结果)

safely(log)(10)
#> $result
#> [1] 2.302585
#>
#> $error
#> NULL
safely(log)("a")
#> $result
#> NULL
#>
#> $error
#> <simpleError in .f(...): non-numeric argument to mathematical function>

注意到safely()将一个函数做为输入，并返回一个“安全”函数，这个函数永远不会抛出错误。一个强大的技术是地使用safely()和map()一块儿尝试对列表中的每一个元素进行操做：

#注意safely()是将函数做为输入，故若对多个值进行操做，可用map
safe_sqrt <- safely(sqrt, otherwise = NA_real_)
numbers_with_error <- list(1, 2, 3, "spam", 4)
map(numbers_with_error, safe_sqrt)%>%transpose()

safe_log <- safely(log)
x <- list(10, "a", 5)
log_x <- x %>% map(safe_log)

str(log_x)
#> List of 3
#>  $ :List of 2
#>   ..$ result: num 2.3
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: NULL
#>   ..$ error :List of 2
#>   .. ..$ message: chr "non-numeric argument to mathematical function"
#>   .. ..$ call   : language .f(...)
#>   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
#>  $ :List of 2
#>   ..$ result: num 1.61
#>   ..$ error : NULL

这是输出稍微不方便，由于你更想有一个三个结果的列表，另外一个三个错误的列表。您可使用新的transpose()函数来切换层次结构中第一个和第二个层次的顺序：

log_x %>% transpose() %>% str()
#> List of 2
#>  $ result:List of 3
#>   ..$ : num 2.3
#>   ..$ : NULL
#>   ..$ : num 1.61
#>  $ error :List of 3
#>   ..$ : NULL
#>   ..$ :List of 2
#>   .. ..$ message: chr "non-numeric argument to mathematical function"
#>   .. ..$ call   : language .f(...)
#>   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
#>   ..$ : NULL

这样能够很容易地提取原始函数失败的输入，或者保存成功结果：

keep()保留TRUE的

discard()保留FALSE的

results <- x %>% map(safe_log) %>% transpose()

(ok <- results$error %>% map_lgl(is_null))
#> [1]  TRUE FALSE  TRUE
(bad_inputs <- x %>% discard(ok))
#> [[1]]
#> [1] "a"
(successes <- results$result %>% keep(ok) %>% flatten_dbl())
#> [1] 2.302585 1.609438

通常分3步：

保留转化好的含有result 与 error 的列表
使用map_lgl(is_null)获取逻辑向量
根据逻辑向量与列表获得想要的数

purrr 0.2.0

purrr 0.2.0

Hadley Wickham

2016-01-06

类型稳定的映射

Type-stable flatten

Type-stable try()

Type-stable `try()`