R: replace multiple occurrences of regex-matched strings in dataframe fields by looking them up in another dataframe

R: replace multiple occurrences of regex-matched strings in dataframe fields by looking them up in another dataframe

I have two dataframes:

df lookup:

oldId <- c(123, 456, 567, 789)
newId <- c(1, 2, 3, 4)
lookup <- data.frame(oldId, newId)

df data:

descr <- c("description with no match",
+ "description with one 123 match", 
+ "description with again no match",
+ "description 456 with two 789 matches")

Goal:

I want a new dataframe:

  • same structure as the data df
  • same field values, except that all instances of numbers (i.e. 123, 456, 789) are looked up in the other dataframe, and replaced by lookup$newId.

The resulting dataframe will thus look like this:

  1. "description with no match"
  2. "description with one 1 match"
  3. "description with again no match"
  4. "description 2 with two 4 matches"

So, each text in the descr column may have a large amount of numbers which need to be replaced. Of course, this is a stripped down example; my real life dataframes are much bigger.

I do have the regex-part fixed:

fx <- function(x) {gsub("([[:digit:]]{3})", "TESTTEST", x)}
data$descr <- lapply(data$descr, fx)

But I have no idea how to let the function loop over all matches in a row, and then let it look up the number and replace it.

Answer

You can supply a function to act as the replacement for stringr::str_replace_all():

stringr::str_replace_all(
    descr, 
    "([[:digit:]]{3})", 
    \(x, old = lookup$oldId, new = lookup$newId) newId[oldId == x]
)
# [1] "description with no match"        "description with one 1 match"     "description with again no match"  "description 2 with two 4 matches"

Enjoyed this article?

Check out more content on our blog or follow us on social media.

Browse more articles