how to pass several variables in for a pandas groupby

how to pass several variables in for a pandas groupby

This code works:

cohort = r'priority' 
result2025 = df.groupby([cohort],as_index=False).agg({'resolvetime': ['count','mean']})

and this code works

cohort = r'impactedservice' 
result2025 = df.groupby([cohort],as_index=False).agg({'resolvetime': ['count','mean']})

and this code works

result2025 = df.groupby(['impactedservice','priority'],as_index=False).agg({'resolvetime': ['count','mean']})

but what is not working for me is defining the cohort variable to be

cohort = r'impactedservice,priority' # a double-cohort
result2025 = df.groupby([cohort],as_index=False).agg({'resolvetime': ['count','mean']})

That gives error:

KeyError: 'impactedservice,priority'

How to properly define the cohort variable in this case?

Answer

The issue is that when you do:

cohort = r'impactedservice,priority'

You're creating a single string, not a list of column names. Pandas treats that as a single column name (which doesn’t exist), hence the KeyError.

Correct way: Define cohort as a list of column names:

cohort = ['impactedservice', 'priority']
result2025 = df.groupby(cohort, as_index=False).agg({'resolvetime': ['count', 'mean']})

Now groupby knows you're grouping by multiple columns. You can build cohort dynamically as a list too if needed:

cohort = ['impactedservice']
if use_priority:
    cohort.append('priority')

Enjoyed this question?

Check out more content on our blog or follow us on social media.

Browse more questions