Applying conditionals across many columns based on string indicators within column names

I am wondering if there is any type of R magic, either tidyverse, or any other, that allows you to apply conditionals on one or more variables that share a common string within each of the column names.
Here's an example. Say we have substance use alcohol alc
and cannabis cann
over a period. Participants get asked if they have used any of each substance in the period. If they answer no their values for freqDays
and quant
are NA, whereas they should be 0.
set.seed(1)
df <- data.frame(alc_yn = factor(x = rep(x = c("no", "yes"),
each = 5),
levels = c("no", "yes")),
freqDays_alc = c(rep(NA,5), sample(1:10,5,replace=F)),
quant_alc = c(rep(NA,5), sample(1:50,5,replace=F)),
cann_yn = factor(x = rep(x = c("yes", "no"),
each = 5),
levels = c("no", "yes")),
freqDays_cann = c(sample(1:10,5,replace=F),rep(NA,5)),
quant_cann = c(sample(1:50,5,replace=F),rep(NA,5)))
df
# alc_yn freqDays_alc quant_alc cann_yn freqDays_cann quant_cann
# 1 no NA NA yes 5 10
# 2 no NA NA yes 10 7
# 3 no NA NA yes 2 9
# 4 no NA NA yes 6 15
# 5 no NA NA yes 7 21
# 6 yes 9 23 no NA NA
# 7 yes 4 43 no NA NA
# 8 yes 7 14 no NA NA
# 9 yes 1 18 no NA NA
# 10 yes 2 33 no NA NA
I would like to apply a conditional so that whenever the _yn
variable is no
for each of these variables, it changes the values in the freqDays_
and quant_
columns of the corresponding columns to 0. It should look like this.
df_sol <- data.frame(alc_yn = factor(x = rep(x = c("no", "yes"),
each = 5),
levels = c("no", "yes")),
freqDays_alc = c(rep(0,5), sample(1:10,5,replace=F)),
quant_alc = c(rep(0,5), sample(1:50,5,replace=F)),
cann_yn = factor(x = rep(x = c("yes", "no"),
each = 5),
levels = c("no", "yes")),
freqDays_alc = c(sample(1:10,5,replace=F),rep(0,5)),
quant_cann = c(sample(1:50,5,replace=F),rep(0,5)))
df_sol
# alc_yn freqDays_alc quant_alc cann_yn freqDays_alc.1 quant_cann
# 1 no 0 0 yes 1 42
# 2 no 0 0 yes 4 38
# 3 no 0 0 yes 3 47
# 4 no 0 0 yes 6 20
# 5 no 0 0 yes 2 28
# 6 yes 5 37 no 0 0
# 7 yes 9 34 no 0 0
# 8 yes 1 42 no 0 0
# 9 yes 6 25 no 0 0
# 10 yes 10 44 no 0 0
I'd prefer a solution that doesn't involve pivoting longer and wider several times, but if there's no other way that will do.
Answer
Here's an approach in {tidyverse}
.
First, I find the substances by extracting the column names that have _yn
in their names using str_extract()
and str_detect()
. Then I loop over each substance (i.e. subst
) with reduce()
. Next, I do a conditional replacement (case_when()
) for freqDays_{subst}
and quant_{subst}
columns using across()
and matches()
; glue()
is helpful to create the column names for each substance.
library(tidyverse)
df %>%
reduce(str_extract(names(.)[str_detect(names(.), "_yn$")], "^[^_]+"),
\(data, subst) {
data %>%
mutate(across(matches(glue::glue("(freqDays|quant)_{subst}$")),
~ case_when(
.data[[glue::glue("{subst}_yn")]] == "no" ~ 0,
TRUE ~ .x
)))
}, .init = .)
#> alc_yn freqDays_alc quant_alc cann_yn freqDays_cann quant_cann
#> 1 no 0 0 yes 5 10
#> 2 no 0 0 yes 10 7
#> 3 no 0 0 yes 2 9
#> 4 no 0 0 yes 6 15
#> 5 no 0 0 yes 7 21
#> 6 yes 9 23 no 0 0
#> 7 yes 4 43 no 0 0
#> 8 yes 7 14 no 0 0
#> 9 yes 1 18 no 0 0
#> 10 yes 2 33 no 0 0
Created on 2025-06-29 with reprex v2.1.1
Enjoyed this question?
Check out more content on our blog or follow us on social media.
Browse more questions