-
Notifications
You must be signed in to change notification settings - Fork 115
Description
I am having some difficulties calculating rowMeans using c_across with labelled variables. The following code recreates two data frames I am using. The problem occurs with df1
and not with df
, although the code is identical. But I think there is something about the way that value labels have been assigned to the variables in df1
that is causing the problem. Please note that using rowMeans()
does work just fine; the problem appears to be the interaction of labelled variables in a particular way with c_across()
. I really like the way that c_across()
does not necessitate reattaching variables to the data.frame. In the olden times, I would have used mutate_at()
to get at this.
Note: I often use car::Recode()
rather than dplyr::Recode()
because I find the syntax a little simpler, and because of path dependency; I have a ton of historic code that relies on it; I think dplyr::recode()
.
The code below returns the following error:
Error: Problem with mutate() input market_liberalism.
x labels must be unique.
Input market_liberalism is mean(c_across(market1:market2)).
The error occurred in row 1.
#Install car package if necessary
#install.packages('car')
library(tidyverse)
library(car)
library(labelled)
#this recreates df1
structure(list(PESE15 = structure(c(3, 5, 5, 8, NA), label = "The Government Should Leave it Entirely to the Private Sector to Create Jobs", na_values = c(8, 9), format.spss = "F1.0", display_width = 0L, labels = c(`Strongly agree` = 1, `Somewhat agree` = 3, Somewhatdisagree = 5, Stronglydisagree = 7,D.K. = 8, Refused = 9), class = c("haven_labelled_spss", "haven_labelled", "vctrs_vctr", "double")), MBSA2 = structure(c(3, 8, 1, 1, NA), label = "People Who Do Not Get Ahead Should Blame Themselves Not the System", na_values = 8, format.spss = "F1.0", display_width = 0L, labels = c(`Strongly agree` = 1, Agree = 2, Disagree = 3, Stronglydisagree = 4, `No opinion` = 8), class = c("haven_labelled_spss", "haven_labelled", "vctrs_vctr", "double"))), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"), label = "NSDstat generated file")->df1
#use the car::Recode command to convert values to 0 to 1
df1$market1<-car::Recode(df1$PESE15, "1=1; 3=0.75; 5=0.25; 7=0; 8=0.5; else=NA")
df1$market2<-car::Recode(df1$MBSA2, "1=1; 2=0.75; 3=0.25; 4=0; 8=0.5; else=NA")
#Use dplyr::c_across() to try to calculate the average
df1 %>%
rowwise() %>%
mutate(market_liberalism=mean(
c_across(market1:market2)))
#Using RowMeans does work.
df1 %>%
select(market1:market2) %>%
mutate(market_liberalism=rowMeans(., na.rm=T))
#that works, but then it is somewhat difficult to get it back into the original data.frame
#setting value labels to NULL makes it work.
val_labels(df1$market1)<-NULL
val_labels(df1$market2)<-NULL
#Try again
df1 %>%
rowwise() %>%
mutate(market_liberalism=mean(
c_across(market1:market2)))
#This makes df2, similar dataset
structure(list(cpsf6 = structure(c(3, 7, 7, 1, 7, 7), label = "The Government Should Leave it Entirely to the Private Sector to Create Jobs", na_values = c(8,
9), format.spss = "F1.0", display_width = 0L, labels = c(`Strongly Agree` = 1,
`Somewhat Agree` = 3, SomewhatDisagree = 5, StronglyDisagree = 7,
D.K. = 8, Refused = 9), class = c("haven_labelled_spss", "haven_labelled",
"vctrs_vctr", "double")), pese19 = structure(c(3, 7, 3, 1, NA, 5), label = "People Who Do Not Get Ahead Should Blame Themselves, Not the System", na_values = c(8, 9), format.spss = "F1.0", display_width = 0L, labels = c(`Strongly Agree` = 1, `Somewhat Agree` = 3, SomewhatDisagree = 5, StronglyDisagree = 7,
D.K. = 8, Refused = 9), class = c("haven_labelled_spss", "haven_labelled",
"vctrs_vctr", "double"))), row.names = c(NA, -6L), class = c("tbl_df", "data.frame"))->df2
#use car::Recode()
df2$market1<-car::Recode(df2$cpsf6, "1=1; 3=0.75; 5=0.25; 7=0; 8=0.5; else=NA", as.numeric=T)
df2$market2<-car::Recode(df2$pese19, "1=1; 3=0.75; 5=0.25; 7=0; 8=0.5; else=NA", as.numeric=T)
#
df2 %>%
rowwise() %>%
mutate(market_liberalism=mean(
c_across(market1:market2)
, na.rm=T ))
Results from sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] labelled_2.8.0 cesdata_0.1.0 car_3.0-10 carData_3.0-4 forcats_0.5.1 stringr_1.4.0
[7] dplyr_1.0.5 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.1 ggplot2_3.3.3
[13] tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 lubridate_1.7.10 lattice_0.20-41 assertthat_0.2.1 psych_2.1.3 utf8_1.2.1
[7] R6_2.5.0 cellranger_1.1.0 backports_1.2.1 reprex_1.0.0 httr_1.4.2 pillar_1.6.0
[13] rlang_0.4.11 curl_4.3 readxl_1.3.1 rstudioapi_0.13 data.table_1.14.0 foreign_0.8-81
[19] munsell_0.5.0 broom_0.7.5 compiler_4.0.4 modelr_0.1.8 pkgconfig_2.0.3 mnormt_2.0.2
[25] tmvnsim_1.0-2 tidyselect_1.1.1 rio_0.5.26 fansi_0.4.2 withr_2.4.1 crayon_1.4.1
[31] dbplyr_2.1.0 grid_4.0.4 nlme_3.1-152 jsonlite_1.7.2 gtable_0.3.0 lifecycle_1.0.0
[37] DBI_1.1.1 magrittr_2.0.1 scales_1.1.1 zip_2.1.1 cli_2.5.0 stringi_1.5.3
[43] fs_1.5.0 xml2_1.3.2 ellipsis_0.3.2 generics_0.1.0 vctrs_0.3.8 openxlsx_4.2.3
[49] tools_4.0.4 glue_1.4.2 hms_1.0.0 abind_1.4-5 parallel_4.0.4 colorspace_2.0-0
[55] rvest_1.0.0 haven_2.4.1.9000