Update Values in a Column Based on Conditions Across Multiple Rows Within Groups

I have a Pandas DataFrame representing data grouped by an ID, and I need to update values in one column based on conditions that involve values in other rows within the same group. Specifically, I want to update a column status to 'resolved' under these conditions:

The status column for the current row is initially 'pending'.
There is at least one other row within the same group_id where status is 'active' and the related_id matches the id of the current row.

import pandas as pd

data = {'id': [1, 2, 3, 4, 5, 6, 7],
        'group_id': ['A', 'A', 'A', 'B', 'B', 'B', 'B'],
        'status': ['pending', 'active', 'pending', 'pending', 'active', 'pending', 'pending'],
        'related_id': [None, None, 1, None, None, 4, 4]}
df = pd.DataFrame(data)
print(df)

   id group_id   status  related_id
 1        A  pending        None
 2        A   active        None
 3        A  pending           1
 4        B  pending        None
 5        B   active        None
 6        B  pending           4
 7        B  pending           4

In this example, rows with id 3, 6, and 7 should have their status updated to 'resolved' because they are 'pending', belong to the same group_id as a row with 'active' status, and their related_id matches the id of that 'active' row (row 1 for id 3, row 4 for id 6 & 7).

The output I want:

   id group_id    status  related_id
 1        A   pending        None
 2        A    active        None
 3        A  resolved           1
 4        B   pending        None
 5        B    active        None
 6        B  resolved           4
 7        B  resolved           4

I tried using groupby() and apply(), but I'm struggling to apply the condition check across rows within each group.

Answer

A possible solution:

m = (df.groupby('group_id')
    .apply(lambda x: 
        x['related_id'].isin(x.index[x['status'].eq('active')]), 
        include_groups=False)
    .reset_index(drop=True))

df.loc[m, 'status'] = 'resolved'

Output:

   id group_id    status  related_id
 1        A   pending         NaN
 2        A    active         NaN
 3        A  resolved         1.0
 4        B   pending         NaN
 5        B    active         NaN
 6        B  resolved         4.0
 7        B  resolved         4.0

Update Values in a Column Based on Conditions Across Multiple Rows Within Groups

Answer

Related Articles

How to configure VS Code launch.json to debug a Streamlit app with a conda environment?

How can I make auto animation using flet?

how to display metadata in scatter plot in matplotlib?

How to select certain indices for multiple dimensions of a PyTorch tensor?

How to type hint `flax.linen.Module.apply`'s output correctly?

'Command python: select interpreter resulted in error