how to pass several variables in for a pandas groupby

This code works:
cohort = r'priority'
result2025 = df.groupby([cohort],as_index=False).agg({'resolvetime': ['count','mean']})
and this code works
cohort = r'impactedservice'
result2025 = df.groupby([cohort],as_index=False).agg({'resolvetime': ['count','mean']})
and this code works
result2025 = df.groupby(['impactedservice','priority'],as_index=False).agg({'resolvetime': ['count','mean']})
but what is not working for me is defining the cohort variable to be
cohort = r'impactedservice,priority' # a double-cohort
result2025 = df.groupby([cohort],as_index=False).agg({'resolvetime': ['count','mean']})
That gives error:
KeyError: 'impactedservice,priority'
How to properly define the cohort variable in this case?
Answer
The issue is that when you do:
cohort = r'impactedservice,priority'
You're creating a single string, not a list of column names. Pandas treats that as a single column name (which doesn’t exist), hence the KeyError
.
Correct way:
Define cohort
as a list of column names:
cohort = ['impactedservice', 'priority']
result2025 = df.groupby(cohort, as_index=False).agg({'resolvetime': ['count', 'mean']})
Now groupby
knows you're grouping by multiple columns. You can build cohort
dynamically as a list too if needed:
cohort = ['impactedservice']
if use_priority:
cohort.append('priority')
Enjoyed this question?
Check out more content on our blog or follow us on social media.
Browse more questions