maxframe.dataframe.DataFrame.apply#

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), dtypes=None, dtype=None, name=None, output_type=None, index=None, elementwise=None, skip_infer=False, check_output_dtypes=None, **kwds)#

沿 DataFrame 的轴应用函数。

传递给函数的对象是 Series 对象，其索引为 DataFrame 的索引（axis=0）或 DataFrame 的列（axis=1）。默认情况下（result_type=None），最终返回类型由所应用函数的返回类型推断。否则，它取决于 result_type 参数。

参数:

func (function) -- 应用于每列或每行的函数。
axis ({0 or 'index', 1 or 'columns'}, default 0) -- 应用函数的轴： * 0 或 'index'：将函数应用于每列。 * 1 或 'columns'：将函数应用于每行。
raw (bool, default False) -- 确定行或列是以 Series 还是 ndarray 对象传递： * False ：将每行或每列作为 Series 传递给函数。 * True ：传递的函数将接收 ndarray 对象。如果您只是应用 NumPy 归约函数，这将获得更好的性能。
result_type ({'expand', 'reduce', 'broadcast', None}, default None) -- 这些选项仅在 ``axis=1``（列）时生效： * 'expand' ：类列表结果将转换为列。 * 'reduce' ：尽可能返回 Series 而不是扩展类列表结果。这是 'expand' 的反向操作。 * 'broadcast' ：结果将广播到 DataFrame 的原始形状，保留原始索引和列。默认行为（None）取决于所应用函数的返回值：类列表结果将作为 Series 返回。但如果应用函数返回 Series，则会扩展为列。
output_type ({'dataframe', 'series'}, default None) -- 指定返回对象的类型。详见 Notes。
dtypes (Series, default None) -- 指定返回 DataFrames 的数据类型。详见 Notes。
dtype (numpy.dtype, default None) -- 指定返回 Series 的数据类型。详见 Notes。
name (str, default None) -- 指定返回 Series 的名称。详见 Notes。
index (Index, default None) -- 指定返回对象的索引。详见 Notes。
elementwise (bool, default False) -- 指定 func 是否为逐元素函数： * False ：函数不是逐元素的。MaxFrame 将尝试按行（axis=0）或按列（axis=1）连接块，然后将 func 应用于连接后的块。连接步骤可能引起额外延迟。 * True ：函数是逐元素的。MaxFrame 将直接将 func 应用于原始块。这不会引入额外的连接步骤，减少了开销。
skip_infer (bool, default False) -- 当未指定 dtypes 或 output_type 时是否推断数据类型。
check_output_dtypes (str, default None) -- 输出数据类型和列的验证模式：- 'ignore'：不执行验证 - 'warns'：验证并在不匹配时显示警告（None时的默认值）- 'raises'：验证并在不匹配时引发错误
args (tuple) -- 传递给 func 的位置参数，除了数组/序列。
**kwds -- 传递给 func 的额外关键字参数。

返回:

沿 DataFrame 给定轴应用 func 的结果。

返回类型:

Series or DataFrame

参见

DataFrame.applymap: 用于逐元素操作。
DataFrame.aggregate: 仅执行聚合类型操作。
DataFrame.transform: 仅执行变换类型操作。

备注

在确定输出数据类型和返回值形状时，MaxFrame 会尝试将 func 应用到一个模拟的 DataFrame 上，此时 apply 调用可能会失败。发生这种情况时，你需要在 output_type 中指定 apply 调用的类型（DataFrame 或 Series）。

对于 DataFrame 输出，你需要指定一个列表或 pandas Series 作为输出 DataFrame 的 dtypes。还可以指定输出的 index。
对于 Series 输出，你需要指定输出 Series 的 dtype 和 name。
对于任何数据类型为 pandas.ArrowDtype(pyarrow.MapType) 的输入，它将始终被转换为 Python 字典。对于任何具有此数据类型的输出，也必须以 Python 字典形式返回。

示例

>>> import numpy as np
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df.execute()
   A  B
0  4  9
1  4  9
2  4  9

在任一轴上使用归约函数

>>> df.apply(np.sum, axis=0).execute()
A    12
B    27
dtype: int64

>>> df.apply(lambda row: int(np.sum(row)), axis=1).execute()
0    13
1    13
2    13
dtype: int64

传递 result_type='expand' 会将类列表结果扩展为 DataFrame 的列

>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand').execute()
1
1  2
1  2
1  2

在函数中返回一个 Series 与传递 result_type='expand' 类似。结果列名将是 Series 的索引。

>>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1).execute()
   foo  bar
0    1    2
1    1    2
2    1    2

传递 result_type='broadcast' 将确保结果保持相同形状，无论函数返回的是类列表还是标量，并沿轴进行广播。结果列名将保持原始列名。

>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast').execute()
   A  B
0  1  2
1  1  2
2  1  2

创建一个包含映射类型的 dataframe。

>>> import pyarrow as pa
>>> import pandas as pd
>>> from maxframe.lib.dtypes_extension import dict_
>>> col_a = pd.Series(
...     data=[[("k1", 1), ("k2", 2)], [("k1", 3)], None],
...     index=[1, 2, 3],
...     dtype=dict_(pa.string(), pa.int64()),
... )
>>> col_b = pd.Series(
...     data=["A", "B", "C"],
...     index=[1, 2, 3],
... )
>>> df = md.DataFrame({"A": col_a, "B": col_b})
>>> df.execute()
                        A  B
1  [('k1', 1), ('k2', 2)]  A
2             [('k1', 3)]  B
3                    <NA>  C

定义一个使用新键值对更新映射类型的函数。

>>> def custom_set_item(x):
...     if x["A"] is not None:
...         x["A"]["k2"] = 10
...     return x

>>> df.apply(
...     custom_set_item,
...     axis=1,
...     output_type="dataframe",
...     dtypes=df.dtypes.copy(),
... ).execute()
                         A  B
1  [('k1', 1), ('k2', 10)]  A
2  [('k1', 3), ('k2', 10)]  B
3                     <NA>  C