maxframe.dataframe.groupby.DataFrameGroupBy.mf.apply_chunk#
- DataFrameGroupBy.mf.apply_chunk(func: str | Callable, batch_rows=None, *, dtypes=None, dtype=None, name=None, output_type=None, index=None, skip_infer=False, order_cols=None, ascending=True, prepend_index_group_keys=False, args=(), **kwargs)#
Apply function func group-wise and combine the results together. The pandas DataFrame given to the function is a chunk of the input dataframe, consider as a batch rows.
The function passed to apply must take a dataframe as its first argument and return a DataFrame, Series or scalar. apply will then take care of combining the results back together into a single dataframe or series. apply is therefore a highly flexible grouping method.
Don’t expect to receive all rows of the DataFrame in the function, as it depends on the implementation of MaxFrame and the internal running state of MaxCompute.
- Parameters:
func (callable) – A callable that takes a dataframe as its first argument, and returns a dataframe, a series or a scalar. In addition, the callable may take positional and keyword arguments.
batch_rows (int) – Specify expected number of rows in a batch, as well as the len of function input dataframe. When the remaining data is insufficient, it may be less than this number.
output_type ({'dataframe', 'series'}, default None) – Specify type of returned object. See Notes for more details.
dtypes (Series, default None) – Specify dtypes of returned DataFrames. See Notes for more details.
dtype (numpy.dtype, default None) – Specify dtype of returned Series. See Notes for more details.
name (str, default None) – Specify name of returned Series. See Notes for more details.
index (Index, default None) – Specify index of returned object. See Notes for more details.
skip_infer (bool, default False) – Whether to infer dtypes when dtypes or output_type is not specified.
prepend_index_group_keys (bool, default False) –
If True, the index of returned dataframe or series will automatically contain group keys if
as_index=True, or group indexes ifas_index=False, whengroup_keys=True. It will also exclude group keys in user function inputs by default. See notes for more details.Note
prepend_index_group_keyswill be set to True by default in future releases, and a warning will be shown if the parameter is set to False. To make sure your code works in future releases, please set this to True and remove group indexes in index parameter or type annotation offunc.args (tuple and dict) – Optional positional and keyword arguments to pass to
func.kwargs (tuple and dict) – Optional positional and keyword arguments to pass to
func.
- Returns:
applied
- Return type:
See also
Series.applyApply a function to a Series.
DataFrame.applyApply a function to each row or column of a DataFrame.
DataFrame.mf.apply_chunkApply a function to row batches of a DataFrame.
Notes
When deciding output dtypes and shape of the return value, MaxFrame will try applying
funconto a mock grouped object, and the apply call may fail. When this happens, you need to specify the type of apply call (DataFrame or Series) in output_type.For DataFrame output, you need to specify a list or a pandas Series as
dtypesof output DataFrame.For Series output, you need to specify
dtypeandnameof output Series.indexdetermines index of output DataFrame or Series. You may specify a dummy pandas index indicating the names and types of index of the output offunc, for instance,pd.MultiIndex.from_tuples([("a", 0)], names=["key1", "key2"]). Ifindexis not supplied, index of the input DataFrame or Series will be used. When prepend_index_group_keys is True, the index of the returning object will beindexprepended with group information givenas_indexandgroup_keysargument of thegroupbyfunction, which is consistent with pandas 3.0. Whenprepend_index_group_keysis False, you must specify a mock index with all fields, including group keys. As it is complicated to pass full index definition,prepend_index_group_keys=Falsewill be deprecated in near future. Please supplyprepend_index_group_keys=Truewhere possible.
MaxFrame adopts expected behavior of pandas>=3.0 by ignoring group columns in user function input. If you still need a group column for your function input, try selecting it right after groupby results, for instance,
df.groupby("A")[["A", "B", "C"]].mf.apply_chunk(func)will pass data of column A intofunc.