maxframe.dataframe.DataFrame.compare#

DataFrame.compare(other, align_axis: int | str = 1, keep_shape: bool = False, keep_equal: bool = False, result_names: Tuple[str, str] = ('self', 'other'))#

与另一个 DataFrame 比较并显示差异。

参数:

other (DataFrame) -- 要比较的对象。
align_axis ({0 or 'index', 1 or 'columns'}, default 1) -- 确定比较对齐的轴。 * 0 或 'index'：结果差异垂直堆叠，行交替从 self 和 other 中提取。* 1 或 'columns'：结果差异水平对齐，列交替从 self 和 other 中提取。
keep_shape (bool, default False) -- 如果为 true，则保留所有行和列。否则，仅保留值不同的行和列。
keep_equal (bool, default False) -- 如果为 true，则保留相等的值。否则，相等的值显示为 NaN。
result_names (tuple, default (‘self’, ‘other’)) -- 在比较中设置 DataFrame 的名称。

返回:

显示差异并排堆叠的 DataFrame。结果索引将是一个 MultiIndex，其中 'self' 和 'other' 在内层交替堆叠。

返回类型:

DataFrame

抛出:

ValueError -- 当两个 DataFrame 没有相同的标签或形状时引发。

参见

Series.compare: 与另一个 Series 比较并显示差异。
DataFrame.equals: 测试两个对象是否包含相同的元素。

备注

匹配的 NaN 不会显示为差异。

只能比较具有相同标签（即相同形状、相同的行列标签）的 DataFrame

示例

>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(
...     {
...         "col1": ["a", "a", "b", "b", "a"],
...         "col2": [1.0, 2.0, 3.0, mt.nan, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
...     },
...     columns=["col1", "col2", "col3"],
... )
>>> df.execute()
  col1  col2  col3
0    a   1.0   1.0
1    a   2.0   2.0
2    b   3.0   3.0
3    b   NaN   4.0
4    a   5.0   5.0

>>> df2 = df.copy()
>>> df2.loc[0, 'col1'] = 'c'
>>> df2.loc[2, 'col3'] = 4.0
>>> df2.execute()
  col1  col2  col3
0    c   1.0   1.0
1    a   2.0   2.0
2    b   3.0   4.0
3    b   NaN   4.0
4    a   5.0   5.0

按列对齐差异

>>> df.compare(df2).execute()
  col1       col3
  self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

按行堆叠差异

>>> df.compare(df2, align_axis=0).execute()
        col1  col3
0 self     a   NaN
  other    c   NaN
2 self   NaN   3.0
  other  NaN   4.0

保留相等的值

>>> df.compare(df2, keep_equal=True).execute()
  col1       col3
  self other self other
0    a     c  1.0   1.0
2    b     b  3.0   4.0

保留所有原始行和列

>>> df.compare(df2, keep_shape=True).execute()
  col1       col2       col3
  self other self other self other
0    a     c  NaN   NaN  NaN   NaN
1  NaN   NaN  NaN   NaN  NaN   NaN
2  NaN   NaN  NaN   NaN  3.0   4.0
3  NaN   NaN  NaN   NaN  NaN   NaN
4  NaN   NaN  NaN   NaN  NaN   NaN

保留所有原始行和列以及所有原始值

>>> df.compare(df2, keep_shape=True, keep_equal=True).execute()
  col1       col2       col3
  self other self other self other
0    a     c  1.0   1.0  1.0   1.0
1    a     a  2.0   2.0  2.0   2.0
2    b     b  3.0   3.0  3.0   4.0
3    b     b  NaN   NaN  4.0   4.0
4    a     a  5.0   5.0  5.0   5.0