-
Notifications
You must be signed in to change notification settings - Fork 115
Closed
Description
This problem is very similar to tidyverse/tidyr#520. When there is a large number of groups, fill()
is much slower with labelled data than with only numeric data.
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
library(haven)
set.seed(2)
n <- 1e4
my_sample <- sample(c(1:10, NA), n, replace = TRUE)
df <- tibble(
group = sample(paste("id", 1:(n/4)), n, replace = TRUE),
num = my_sample,
lab = haven::labelled(my_sample)
) %>%
group_by(group)
bench::mark(
num = fill(df, num, .direction = "updown"),
lab = fill(df, lab, .direction = "updown"),
check = FALSE
)[1:4]
#> # A tibble: 2 x 4
#> expression min median `itr/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl>
#> 1 num 37.09ms 41.65ms 23.3
#> 2 lab 1.99s 1.99s 0.502
#> Warning message:
#> Some expressions had a GC in every iteration; so filtering is disabled.
Note that the timing is similar when the data is not grouped:
set.seed(2)
n <- 1e4
my_sample <- sample(c(1:10, NA), n, replace = TRUE)
df <- tibble(
num = my_sample,
lab = haven::labelled(my_sample)
)
bench::mark(
num = fill(df, num, .direction = "updown"),
lab = fill(df, lab, .direction = "updown"),
check = FALSE
)[1:4]
#> # A tibble: 2 x 4
#> expression min median `itr/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl>
#> 1 num 22.4ms 24.4ms 39.4
#> 2 lab 26.1ms 32.9ms 29.7
Done with development versions of tidyr
, dplyr
and haven
.
Metadata
Metadata
Assignees
Labels
No labels