maxframe.dataframe.DataFrame.mf.flatmap#
- DataFrame.mf.flatmap(func: Callable, dtypes=None, raw=False, args=(), **kwargs)#
Apply the given function to each row and then flatten results. Use this method if your transformation returns multiple rows for each input row.
This function applies a transformation to each row of the DataFrame, where the transformation can return zero or multiple values, effectively flattening Python generators, list-like collections, and DataFrames.
- Parameters:
func (Callable) – Function to apply to each row of the DataFrame. It should accept a Series (or an array if raw=True) representing a row and return a list or iterable of values.
dtypes (Series, dict or list) – Specify dtypes of returned DataFrame.
raw (bool, default False) –
Determines if the row is passed as a Series or as a numpy array:
False
: passes each row as a Series to the function.True
: the passed function will receive numpy array objects instead.
args (tuple) – Positional arguments to pass to func.
**kwargs – Additional keyword arguments to pass as keywords arguments to func.
- Returns:
Return DataFrame with specified dtypes.
- Return type:
Notes
The
func
must return an iterable of values for each input row. The index of the resulting DataFrame will be repeated based on the number of output rows generated by func.Examples
>>> import numpy as np >>> import maxframe.dataframe as md >>> df = md.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) >>> df.execute() A B 0 1 4 1 2 5 2 3 6
Define a function that takes a number and returns a list of two numbers:
>>> def generate_values_array(row): ... return [row['A'] * 2, row['B'] * 3]
Define a function that takes a row and return two rows and two columns:
>>> def generate_values_in_generator(row): ... yield [row[0] * 2, row[1] * 4] ... yield [row[0] * 3, row[1] * 5]
Which equals to the following function return a dataframe:
>>> def generate_values_in_dataframe(row): ... return pd.DataFrame([[row[0] * 2, row[1] * 4], [row[0] * 3, row[1] * 5]])
Specify dtypes with a function which returns a DataFrame:
>>> df.mf.flatmap(generate_values_array, dtypes=pd.Series({'A': 'int'})).execute() A 0 2 0 12 1 4 1 15 2 6 2 18
Specify raw=True to pass input row as array:
>>> df.mf.flatmap(generate_values_in_generator, dtypes={"A": "int", "B": "int"}, raw=True).execute() A B 0 2 16 0 3 20 1 4 20 1 6 25 2 6 24 2 9 30