how to pass several variables in for a pandas groupby

how to pass several variables in for a pandas groupby
python
Ethan Jackson

This code works:

cohort = r'priority' result2025 = df.groupby([cohort],as_index=False).agg({'resolvetime': ['count','mean']})

and this code works

cohort = r'impactedservice' result2025 = df.groupby([cohort],as_index=False).agg({'resolvetime': ['count','mean']})

and this code works

result2025 = df.groupby(['impactedservice','priority'],as_index=False).agg({'resolvetime': ['count','mean']})

but what is not working for me is defining the cohort variable to be

cohort = r'impactedservice,priority' # a double-cohort result2025 = df.groupby([cohort],as_index=False).agg({'resolvetime': ['count','mean']})

That gives error:

KeyError: 'impactedservice,priority'

How to properly define the cohort variable in this case?

Answer

The issue is that when you do:

cohort = r'impactedservice,priority'

You're creating a single string, not a list of column names. Pandas treats that as a single column name (which doesn’t exist), hence the KeyError.

Correct way: Define cohort as a list of column names:

cohort = ['impactedservice', 'priority'] result2025 = df.groupby(cohort, as_index=False).agg({'resolvetime': ['count', 'mean']})

Now groupby knows you're grouping by multiple columns. You can build cohort dynamically as a list too if needed:

cohort = ['impactedservice'] if use_priority: cohort.append('priority')

Related Articles