DataFrame#

Constructor#

DataFrame([data, index, columns, dtype, ...])

Attributes and underlying data#

Axes

DataFrame.dtypes

Return the dtypes in the DataFrame.

DataFrame.memory_usage([index, deep])

Return the memory usage of each column in bytes.

DataFrame.ndim

Return an int representing the number of axes / array dimensions.

DataFrame.select_dtypes([include, exclude])

Return a subset of the DataFrame's columns based on the column dtypes.

DataFrame.shape

Conversion#

DataFrame.astype(dtype[, copy, errors])

Cast a pandas object to a specified dtype dtype.

DataFrame.convert_dtypes([infer_objects, ...])

Convert columns to best possible dtypes using dtypes supporting pd.NA.

DataFrame.copy()

DataFrame.infer_objects([copy])

Attempt to infer better dtypes for object columns.

Indexing, iteration#

DataFrame.at

Access a single value for a row/column label pair.

DataFrame.head([n])

Return the first n rows.

DataFrame.iat

Access a single value for a row/column pair by integer position.

DataFrame.iloc

Purely integer-location based indexing for selection by position.

DataFrame.insert(loc, column, value[, ...])

Insert column into DataFrame at specified location.

DataFrame.loc

Access a group of rows and columns by label(s) or a boolean array.

DataFrame.mask(cond[, other, inplace, axis, ...])

Replace values where the condition is True.

DataFrame.pop(item)

Return item and drop from frame.

DataFrame.query(expr[, inplace])

Query the columns of a DataFrame with a boolean expression.

DataFrame.tail([n])

Return the last n rows.

DataFrame.xs(key[, axis, level, drop_level])

Return cross-section from the Series/DataFrame.

DataFrame.where(cond[, other, inplace, ...])

Replace values where the condition is False.

Binary operator functions#

DataFrame.add(other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator add).

DataFrame.sub(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator subtract).

DataFrame.mul(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator mul).

DataFrame.div(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

DataFrame.truediv(other[, axis, level, ...])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

DataFrame.floordiv(other[, axis, level, ...])

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

DataFrame.mod(other[, axis, level, fill_value])

Get Modulo of dataframe and other, element-wise (binary operator mod).

DataFrame.pow(other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator pow).

DataFrame.dot(other)

Compute the matrix multiplication between the DataFrame and other.

DataFrame.radd(other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator radd).

DataFrame.rsub(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator rsubtract).

DataFrame.rmul(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

DataFrame.rdiv(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

DataFrame.rtruediv(other[, axis, level, ...])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

DataFrame.rfloordiv(other[, axis, level, ...])

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

DataFrame.rmod(other[, axis, level, fill_value])

Get Modulo of dataframe and other, element-wise (binary operator rmod).

DataFrame.rpow(other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator rpow).

DataFrame.lt(other[, axis, level, fill_value])

Get Less than of dataframe and other, element-wise (binary operator lt).

DataFrame.gt(other[, axis, level, fill_value])

Get Greater than of dataframe and other, element-wise (binary operator gt).

DataFrame.le(other[, axis, level, fill_value])

Get Less than or equal to of dataframe and other, element-wise (binary operator le).

DataFrame.ge(other[, axis, level, fill_value])

Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

DataFrame.ne(other[, axis, level, fill_value])

Get Not equal to of dataframe and other, element-wise (binary operator ne).

DataFrame.eq(other[, axis, level, fill_value])

Get Equal to of dataframe and other, element-wise (binary operator eq).

DataFrame.combine(other, func[, fill_value, ...])

Perform column-wise combine with another DataFrame.

DataFrame.combine_first(other)

Update null elements with value in the same location in other.

Function application, GroupBy & window#

DataFrame.apply(func[, axis, raw, ...])

Apply a function along an axis of the DataFrame.

DataFrame.applymap(func[, na_action, ...])

Apply a function to a Dataframe elementwise.

DataFrame.agg([func, axis])

Aggregate using one or more operations over the specified axis.

DataFrame.aggregate([func, axis])

Aggregate using one or more operations over the specified axis.

DataFrame.ewm([com, span, halflife, alpha, ...])

Provide exponential weighted functions.

DataFrame.expanding([min_periods, shift, ...])

Provide expanding transformations.

DataFrame.groupby([by, level, as_index, ...])

Group DataFrame using a mapper or by a Series of columns.

DataFrame.map(func[, na_action, dtypes, ...])

Apply a function to a Dataframe elementwise.

DataFrame.rolling(window[, min_periods, ...])

Provide rolling window calculations.

DataFrame.transform(func[, axis, dtypes, ...])

Call func on self producing a DataFrame with transformed values.

Computations / descriptive stats#

DataFrame.abs()

DataFrame.all([axis, bool_only, skipna, ...])

DataFrame.any([axis, bool_only, skipna, ...])

DataFrame.clip([lower, upper, axis, inplace])

Trim values at input threshold(s).

DataFrame.count([axis, level, numeric_only])

DataFrame.corr([method, min_periods])

Compute pairwise correlation of columns, excluding NA/null values.

DataFrame.corrwith(other[, axis, drop, method])

Compute pairwise correlation.

DataFrame.cov([min_periods, ddof, numeric_only])

Compute pairwise covariance of columns, excluding NA/null values.

DataFrame.describe([percentiles, include, ...])

Generate descriptive statistics.

DataFrame.diff([periods, axis])

First discrete difference of element.

DataFrame.eval(expr[, inplace])

Evaluate a string describing operations on DataFrame columns.

DataFrame.max([axis, skipna, level, ...])

DataFrame.mean([axis, skipna, level, ...])

DataFrame.median([axis, skipna, level, ...])

DataFrame.min([axis, skipna, level, ...])

DataFrame.mode([axis, numeric_only, dropna, ...])

Get the mode(s) of each element along the selected axis.

DataFrame.nunique([axis, dropna])

Count distinct observations over requested axis.

DataFrame.pct_change([periods, fill_method, ...])

Percentage change between the current and a prior element.

DataFrame.prod([axis, skipna, level, ...])

DataFrame.product([axis, skipna, level, ...])

DataFrame.quantile([q, axis, numeric_only, ...])

Return values at the given quantile over requested axis.

DataFrame.rank([axis, method, numeric_only, ...])

Compute numerical data ranks (1 through n) along axis.

DataFrame.round([decimals])

Round a DataFrame to a variable number of decimal places.

DataFrame.sem([axis, skipna, level, ddof, ...])

DataFrame.std([axis, skipna, level, ddof, ...])

DataFrame.sum([axis, skipna, level, ...])

DataFrame.value_counts([subset, normalize, ...])

DataFrame.var([axis, skipna, level, ddof, ...])

Reindexing / selection / label manipulation#

DataFrame.add_prefix(prefix)

Prefix labels with string prefix.

DataFrame.add_suffix(suffix)

Suffix labels with string suffix.

DataFrame.align(other[, join, axis, level, ...])

Align two objects on their axes with the specified join method.

DataFrame.at_time(time[, axis])

Select values at particular time of day (e.g., 9:30AM).

DataFrame.between_time(start_time, end_time)

Select values between particular times of the day (e.g., 9:00-9:30 AM).

DataFrame.drop([labels, axis, index, ...])

Drop specified labels from rows or columns.

DataFrame.drop_duplicates([subset, keep, ...])

Return DataFrame with duplicate rows removed.

DataFrame.droplevel(level[, axis])

Return Series/DataFrame with requested index / column level(s) removed.

DataFrame.duplicated([subset, keep, method])

Return boolean Series denoting duplicate rows.

DataFrame.filter([items, like, regex, axis])

Subset the dataframe rows or columns according to the specified index labels.

DataFrame.head([n])

Return the first n rows.

DataFrame.idxmax([axis, skipna])

Return index of first occurrence of maximum over requested axis.

DataFrame.idxmin([axis, skipna])

Return index of first occurrence of minimum over requested axis.

DataFrame.reindex([labels, index, columns, ...])

Conform Series/DataFrame to new index with optional filling logic.

DataFrame.reindex_like(other[, method, ...])

Return an object with matching indices as other object.

DataFrame.rename([mapper, index, columns, ...])

Alter axes labels.

DataFrame.rename_axis([mapper, index, ...])

Set the name of the axis for the index or columns.

DataFrame.reset_index([level, drop, ...])

Reset the index, or a level of it.

DataFrame.sample([n, frac, replace, ...])

Return a random sample of items from an axis of object.

DataFrame.set_axis(labels[, axis, inplace])

Assign desired index to given axis.

DataFrame.set_index(keys[, drop, append, ...])

Set the DataFrame index using existing columns.

DataFrame.take(indices[, axis])

Return the elements in the given positional indices along an axis.

DataFrame.truncate([before, after, axis, copy])

Truncate a Series or DataFrame before and after some index value.

Missing data handling#

DataFrame.dropna([axis, how, thresh, ...])

Remove missing values.

DataFrame.fillna([value, method, axis, ...])

Fill NA/NaN values using the specified method.

DataFrame.isna()

Detect missing values.

DataFrame.isnull()

Detect missing values.

DataFrame.notna()

Detect existing (non-missing) values.

DataFrame.notnull()

Detect existing (non-missing) values.

Reshaping, sorting, transposing#

DataFrame.melt([id_vars, value_vars, ...])

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

DataFrame.nlargest(n, columns[, keep])

Return the first n rows ordered by columns in descending order.

DataFrame.nsmallest(n, columns[, keep])

Return the first n rows ordered by columns in ascending order.

DataFrame.pivot(columns[, index, values])

Return reshaped DataFrame organized by given index / column values.

DataFrame.pivot_table([values, index, ...])

Create a spreadsheet-style pivot table as a DataFrame.

DataFrame.reorder_levels(order[, axis])

Rearrange index levels using input order.

DataFrame.sort_values(by[, axis, ascending, ...])

Sort by the values along either axis.

DataFrame.sort_index([axis, level, ...])

Sort object by labels (along an axis).

DataFrame.swaplevel([i, j, axis])

Swap levels i and j in a MultiIndex.

DataFrame.stack([level, dropna])

Stack the prescribed level(s) from columns to index.

DataFrame.unstack([level, fill_value])

Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.

Combining / comparing / joining / merging#

DataFrame.append(other[, ignore_index, ...])

Append rows of other to the end of caller, returning a new object.

DataFrame.assign(**kwargs)

Assign new columns to a DataFrame.

DataFrame.compare(other[, align_axis, ...])

Compare to another DataFrame and show the differences.

DataFrame.join(other[, on, how, lsuffix, ...])

Join columns of another DataFrame.

DataFrame.merge(right[, how, on, left_on, ...])

Merge DataFrame or named Series objects with a database-style join.

DataFrame.update(other[, join, overwrite, ...])

Modify in place using non-NA values from another DataFrame.

Plotting#

DataFrame.plot is both a callable method and a namespace attribute for specific plotting methods of the form DataFrame.plot.<kind>.

DataFrame.plot.area(*args, **kwargs)

Draw a stacked area plot.

DataFrame.plot.bar(*args, **kwargs)

Vertical bar plot.

DataFrame.plot.barh(*args, **kwargs)

Make a horizontal bar plot.

DataFrame.plot.box(*args, **kwargs)

Make a box plot of the DataFrame columns.

DataFrame.plot.density(*args, **kwargs)

Generate Kernel Density Estimate plot using Gaussian kernels.

DataFrame.plot.hexbin(*args, **kwargs)

Generate a hexagonal binning plot.

DataFrame.plot.hist(*args, **kwargs)

Draw one histogram of the DataFrame's columns.

DataFrame.plot.kde(*args, **kwargs)

Generate Kernel Density Estimate plot using Gaussian kernels.

DataFrame.plot.line(*args, **kwargs)

Plot Series or DataFrame as lines.

DataFrame.plot.pie(*args, **kwargs)

Generate a pie plot.

DataFrame.plot.scatter(*args, **kwargs)

Create a scatter plot with varying marker point size and color.

Serialization / IO / conversion#

DataFrame.from_dict(data[, orient, dtype, ...])

Construct DataFrame from dict of array-like or dicts.

DataFrame.from_records(data[, index, ...])

Convert structured or record ndarray to DataFrame.

DataFrame.to_clipboard(*[, excel, sep, ...])

Copy object to the system clipboard.

DataFrame.to_csv(path[, sep, na_rep, ...])

Write object to a comma-separated values (csv) file.

DataFrame.to_dict([orient, into, index, ...])

Convert the DataFrame to a dictionary.

DataFrame.to_json([path, orient, ...])

Convert the object to a JSON string.

DataFrame.to_odps_table(table[, partition, ...])

Write DataFrame object into a MaxCompute (ODPS) table.

DataFrame.to_pandas([session])

DataFrame.to_parquet(path[, engine, ...])

Write a DataFrame to the binary parquet format, each chunk will be written to a Parquet file.

MaxFrame Extensions#

DataFrame.mf.apply_chunk(func[, batch_rows, ...])

Apply a function that takes pandas DataFrame and outputs pandas DataFrame/Series.

DataFrame.mf.collect_kv([columns, kv_delim, ...])

Merge values in specified columns into a key-value represented column.

DataFrame.mf.extract_kv([columns, kv_delim, ...])

Extract values in key-value represented columns into standalone columns.

DataFrame.mf.flatmap(func[, dtypes, raw, args])

Apply the given function to each row and then flatten results.

DataFrame.mf.map_reduce([mapper, reducer, ...])

Map-reduce API over certain DataFrames.

DataFrame.mf.rebalance([axis, factor, ...])

Make data more balanced across entire cluster.

DataFrame.mf.reshuffle([group_by, sort_by, ...])

Shuffle data in DataFrame or Series to make data distribution more randomized.

DataFrame.rechunk(chunk_size[, reassign_worker])

Rechunk DataFrame, Series or Index data.

DataFrame.mf provides methods unique to MaxFrame. These methods are collated from application scenarios in MaxCompute and these can be accessed like DataFrame.mf.<function/property>.