Applying conditionals across many columns based on string indicators within column names

Applying conditionals across many columns based on string indicators within column names

I am wondering if there is any type of R magic, either tidyverse, or any other, that allows you to apply conditionals on one or more variables that share a common string within each of the column names.

Here's an example. Say we have substance use alcohol alc and cannabis cann over a period. Participants get asked if they have used any of each substance in the period. If they answer no their values for freqDays and quant are NA, whereas they should be 0.

set.seed(1)
df <- data.frame(alc_yn = factor(x = rep(x = c("no", "yes"), 
                                         each = 5),
                                 levels = c("no", "yes")),
                 freqDays_alc = c(rep(NA,5), sample(1:10,5,replace=F)),
                 quant_alc =  c(rep(NA,5), sample(1:50,5,replace=F)),
                 cann_yn = factor(x = rep(x = c("yes", "no"), 
                                         each = 5),
                                 levels = c("no", "yes")),
                 freqDays_cann = c(sample(1:10,5,replace=F),rep(NA,5)),
                 quant_cann = c(sample(1:50,5,replace=F),rep(NA,5)))

df

#    alc_yn freqDays_alc     quant_alc cann_yn freqDays_cann     quant_cann
# 1      no           NA            NA     yes             5             10
# 2      no           NA            NA     yes            10              7
# 3      no           NA            NA     yes             2              9
# 4      no           NA            NA     yes             6             15
# 5      no           NA            NA     yes             7             21
# 6     yes            9            23      no            NA             NA
# 7     yes            4            43      no            NA             NA
# 8     yes            7            14      no            NA             NA
# 9     yes            1            18      no            NA             NA
# 10    yes            2            33      no            NA             NA

I would like to apply a conditional so that whenever the _yn variable is no for each of these variables, it changes the values in the freqDays_ and quant_ columns of the corresponding columns to 0. It should look like this.

df_sol <- data.frame(alc_yn = factor(x = rep(x = c("no", "yes"), 
                                         each = 5),
                                 levels = c("no", "yes")),
                     freqDays_alc = c(rep(0,5), sample(1:10,5,replace=F)),
                     quant_alc =  c(rep(0,5), sample(1:50,5,replace=F)),
                     cann_yn = factor(x = rep(x = c("yes", "no"), 
                                              each = 5),
                                      levels = c("no", "yes")),
                     freqDays_alc = c(sample(1:10,5,replace=F),rep(0,5)),
                     quant_cann = c(sample(1:50,5,replace=F),rep(0,5)))

df_sol

#    alc_yn freqDays_alc     quant_alc cann_yn freqDays_alc.1     quant_cann
# 1      no            0             0     yes              1             42
# 2      no            0             0     yes              4             38
# 3      no            0             0     yes              3             47
# 4      no            0             0     yes              6             20
# 5      no            0             0     yes              2             28
# 6     yes            5            37      no              0              0
# 7     yes            9            34      no              0              0
# 8     yes            1            42      no              0              0
# 9     yes            6            25      no              0              0
# 10    yes           10            44      no              0              0

I'd prefer a solution that doesn't involve pivoting longer and wider several times, but if there's no other way that will do.

Answer

Here's an approach in {tidyverse}.

First, I find the substances by extracting the column names that have _yn in their names using str_extract() and str_detect(). Then I loop over each substance (i.e. subst) with reduce(). Next, I do a conditional replacement (case_when()) for freqDays_{subst} and quant_{subst} columns using across() and matches(); glue() is helpful to create the column names for each substance.

library(tidyverse)

df %>%
    reduce(str_extract(names(.)[str_detect(names(.), "_yn$")], "^[^_]+"), 
           \(data, subst) {
      data %>%
        mutate(across(matches(glue::glue("(freqDays|quant)_{subst}$")), 
                      ~ case_when(
                        .data[[glue::glue("{subst}_yn")]] == "no" ~ 0,
                        TRUE ~ .x
                      )))
    }, .init = .)

#>    alc_yn freqDays_alc quant_alc cann_yn freqDays_cann quant_cann
#> 1      no            0         0     yes             5         10
#> 2      no            0         0     yes            10          7
#> 3      no            0         0     yes             2          9
#> 4      no            0         0     yes             6         15
#> 5      no            0         0     yes             7         21
#> 6     yes            9        23      no             0          0
#> 7     yes            4        43      no             0          0
#> 8     yes            7        14      no             0          0
#> 9     yes            1        18      no             0          0
#> 10    yes            2        33      no             0          0

Created on 2025-06-29 with reprex v2.1.1

Enjoyed this question?

Check out more content on our blog or follow us on social media.

Browse more questions