maxframe.dataframe.groupby.GroupBy.agg#

GroupBy.agg(func=None, method='auto', *args, **kwargs)#

Aggregate using one or more operations on grouped data.

Parameters:
  • groupby (MaxFrame Groupby) – Groupby data.

  • func (str or list-like) – Aggregation functions.

  • method ({'auto', 'shuffle', 'tree'}, default 'auto') – ‘tree’ method provide a better performance, ‘shuffle’ is recommended if aggregated result is very large, ‘auto’ will use ‘shuffle’ method in distributed mode and use ‘tree’ in local mode.

Returns:

Aggregated result.

Return type:

Series or DataFrame

Examples

>>> import maxframe.dataframe as md
>>> df = md.DataFrame(
...     {
...         "A": [1, 1, 2, 2],
...         "B": [1, 2, 3, 4],
...         "C": [0.362838, 0.227877, 1.267767, -0.562860],
...     }
... ).execute()
   A  B         C
0  1  1  0.362838
1  1  2  0.227877
2  2  3  1.267767
3  2  4 -0.562860

The aggregation is for each column.

>>> df.groupby('A').agg('min').execute()
   B         C
A
1  1  0.227877
2  3 -0.562860

Multiple aggregations.

>>> df.groupby('A').agg(['min', 'max']).execute()
    B             C
  min max       min       max
A
1   1   2  0.227877  0.362838
2   3   4 -0.562860  1.267767

Different aggregations per column

>>> df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'}).execute()
    B             C
  min max       sum
A
1   1   2  0.590715
2   3   4  0.704907

To control the output names with different aggregations per column, pandas supports “named aggregation”

>>> from maxframe.dataframe.groupby import NamedAgg
>>> df.groupby("A").agg(
...  b_min=NamedAgg(column="B", aggfunc="min"),
...  c_sum=NamedAgg(column="C", aggfunc="sum")).execute()
   b_min     c_sum
A
1      1  0.590715
2      3  0.704907