Performance of fill() with labelled data after group_by()

This problem is very similar to tidyverse/tidyr#520. When there is a large number of groups, `fill()` is much slower with labelled data than with only numeric data. 

```r
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
library(haven)

set.seed(2)
n <- 1e4
my_sample <- sample(c(1:10, NA), n, replace = TRUE)
df <- tibble(
  group = sample(paste("id", 1:(n/4)), n, replace = TRUE),
  num = my_sample,
  lab = haven::labelled(my_sample)
) %>% 
  group_by(group)

bench::mark(
  num = fill(df, num, .direction = "updown"),
  lab = fill(df, lab, .direction = "updown"),
  check = FALSE
)[1:4]
#> # A tibble: 2 x 4
#>   expression      min   median `itr/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl>
#> 1 num         37.09ms  41.65ms    23.3  
#> 2 lab           1.99s    1.99s     0.502
#> Warning message:
#> Some expressions had a GC in every iteration; so filtering is disabled. 
```
Note that the timing is similar when the data is not grouped:
```r
set.seed(2)
n <- 1e4
my_sample <- sample(c(1:10, NA), n, replace = TRUE)
df <- tibble(
  num = my_sample,
  lab = haven::labelled(my_sample)
) 

bench::mark(
  num = fill(df, num, .direction = "updown"),
  lab = fill(df, lab, .direction = "updown"),
  check = FALSE
)[1:4]
#> # A tibble: 2 x 4
#>   expression      min   median `itr/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl>
#> 1 num          22.4ms   24.4ms      39.4
#> 2 lab          26.1ms   32.9ms      29.7
```
---
Done with development versions of `tidyr`, `dplyr` and `haven`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance of fill() with labelled data after group_by() #658

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance of fill() with labelled data after group_by() #658

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions