DataFrame#

Constructor#

DataFrame([data, index, columns, dtype, ...])

Attributes and underlying data#

Axes

DataFrame.dtypes

Return the dtypes in the DataFrame.

DataFrame.memory_usage([index, deep])

Return the memory usage of each column in bytes.

DataFrame.ndim

Return an int representing the number of axes / array dimensions.

DataFrame.select_dtypes([include, exclude])

Return a subset of the DataFrame's columns based on the column dtypes.

DataFrame.shape

Conversion#

DataFrame.astype(dtype[, copy, errors])

Cast a pandas object to a specified dtype dtype.

DataFrame.convert_dtypes([infer_objects, ...])

Convert columns to best possible dtypes using dtypes supporting pd.NA.

DataFrame.copy()

DataFrame.infer_objects([copy])

Attempt to infer better dtypes for object columns.

Indexing, iteration#

DataFrame.at

Access a single value for a row/column label pair.

DataFrame.head([n])

Return the first n rows.

DataFrame.iat

Access a single value for a row/column pair by integer position.

DataFrame.iloc

Purely integer-location based indexing for selection by position.

DataFrame.insert(loc, column, value[, ...])

Insert column into DataFrame at specified location.

DataFrame.loc

Access a group of rows and columns by label(s) or a boolean array.

DataFrame.mask(cond[, other, inplace, axis, ...])

Replace values where the condition is True.

DataFrame.pop(item)

Return item and drop from frame.

DataFrame.query(expr[, inplace])

Query the columns of a DataFrame with a boolean expression.

DataFrame.tail([n])

Return the last n rows.

DataFrame.xs(key[, axis, level, drop_level])

Return cross-section from the Series/DataFrame.

DataFrame.where(cond[, other, inplace, ...])

Replace values where the condition is False.

Binary operator functions#

DataFrame.add(other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator add).

DataFrame.sub(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator subtract).

DataFrame.mul(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator mul).

DataFrame.div(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

DataFrame.truediv(other[, axis, level, ...])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

DataFrame.floordiv(other[, axis, level, ...])

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

DataFrame.mod(other[, axis, level, fill_value])

Get Modulo of dataframe and other, element-wise (binary operator mod).

DataFrame.pow(other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator pow).

DataFrame.dot(other)

Compute the matrix multiplication between the DataFrame and other.

DataFrame.radd(other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator radd).

DataFrame.rsub(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator rsubtract).

DataFrame.rmul(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

DataFrame.rdiv(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

DataFrame.rtruediv(other[, axis, level, ...])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

DataFrame.rfloordiv(other[, axis, level, ...])

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

DataFrame.rmod(other[, axis, level, fill_value])

Get Modulo of dataframe and other, element-wise (binary operator rmod).

DataFrame.rpow(other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator rpow).

DataFrame.lt(other[, axis, level, fill_value])

Get Less than of dataframe and other, element-wise (binary operator lt).

DataFrame.gt(other[, axis, level, fill_value])

Get Greater than of dataframe and other, element-wise (binary operator gt).

DataFrame.le(other[, axis, level, fill_value])

Get Less than or equal to of dataframe and other, element-wise (binary operator le).

DataFrame.ge(other[, axis, level, fill_value])

Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

DataFrame.ne(other[, axis, level, fill_value])

Get Not equal to of dataframe and other, element-wise (binary operator ne).

DataFrame.eq(other[, axis, level, fill_value])

Get Equal to of dataframe and other, element-wise (binary operator eq).

DataFrame.combine(other, func[, fill_value, ...])

Perform column-wise combine with another DataFrame.

DataFrame.combine_first(other)

Update null elements with value in the same location in other.

Function application, GroupBy & window#

DataFrame.apply(func[, axis, raw, ...])

Apply a function along an axis of the DataFrame.

DataFrame.applymap(func[, na_action, ...])

Apply a function to a Dataframe elementwise.

DataFrame.agg([func, axis])

Aggregate using one or more operations over the specified axis.

DataFrame.aggregate([func, axis])

Aggregate using one or more operations over the specified axis.

DataFrame.ewm([com, span, halflife, alpha, ...])

Provide exponential weighted functions.

DataFrame.expanding([min_periods, shift, ...])

Provide expanding transformations.

DataFrame.groupby([by, level, as_index, ...])

Group DataFrame using a mapper or by a Series of columns.

DataFrame.map(func[, na_action, dtypes, ...])

Apply a function to a Dataframe elementwise.

DataFrame.rolling(window[, min_periods, ...])

Provide rolling window calculations.

DataFrame.transform(func[, axis, dtypes, ...])

Call func on self producing a DataFrame with transformed values.

Computations / descriptive stats#

DataFrame.abs()

DataFrame.all([axis, bool_only, skipna, ...])

DataFrame.any([axis, bool_only, skipna, ...])

DataFrame.clip([lower, upper, axis, inplace])

Trim values at input threshold(s).

DataFrame.count([axis, level, numeric_only])

DataFrame.corr([method, min_periods])

Compute pairwise correlation of columns, excluding NA/null values.

DataFrame.corrwith(other[, axis, drop, method])

Compute pairwise correlation.

DataFrame.cov([min_periods, ddof, numeric_only])

Compute pairwise covariance of columns, excluding NA/null values.

DataFrame.describe([percentiles, include, ...])

Generate descriptive statistics.

DataFrame.diff([periods, axis])

First discrete difference of element.

DataFrame.eval(expr[, inplace])

Evaluate a string describing operations on DataFrame columns.

DataFrame.max([axis, skipna, level, ...])

DataFrame.mean([axis, skipna, level, ...])

DataFrame.median([axis, skipna, level, ...])

DataFrame.min([axis, skipna, level, ...])

DataFrame.mode([axis, numeric_only, dropna, ...])

Get the mode(s) of each element along the selected axis.

DataFrame.nunique([axis, dropna])

Count distinct observations over requested axis.

DataFrame.pct_change([periods, fill_method, ...])

Percentage change between the current and a prior element.

DataFrame.prod([axis, skipna, level, ...])

DataFrame.product([axis, skipna, level, ...])

DataFrame.quantile([q, axis, numeric_only, ...])

Return values at the given quantile over requested axis.

DataFrame.rank([axis, method, numeric_only, ...])

Compute numerical data ranks (1 through n) along axis.

DataFrame.round([decimals])

Round a DataFrame to a variable number of decimal places.

DataFrame.sem([axis, skipna, level, ddof, ...])

DataFrame.std([axis, skipna, level, ddof, ...])

DataFrame.sum([axis, skipna, level, ...])

DataFrame.value_counts([subset, normalize, ...])

DataFrame.var([axis, skipna, level, ddof, ...])

Reindexing / selection / label manipulation#

DataFrame.add_prefix(prefix)

Prefix labels with string prefix.

DataFrame.add_suffix(suffix)

Suffix labels with string suffix.

DataFrame.align(other[, join, axis, level, ...])

Align two objects on their axes with the specified join method.

DataFrame.at_time(time[, axis])

Select values at particular time of day (e.g., 9:30AM).

DataFrame.between_time(start_time, end_time)

Select values between particular times of the day (e.g., 9:00-9:30 AM).

DataFrame.drop([labels, axis, index, ...])

Drop specified labels from rows or columns.

DataFrame.drop_duplicates([subset, keep, ...])

Return DataFrame with duplicate rows removed.

DataFrame.droplevel(level[, axis])

Return Series/DataFrame with requested index / column level(s) removed.

DataFrame.duplicated([subset, keep, method])

Return boolean Series denoting duplicate rows.

DataFrame.filter([items, like, regex, axis])

Subset the dataframe rows or columns according to the specified index labels.

DataFrame.head([n])

Return the first n rows.

DataFrame.idxmax([axis, skipna])

Return index of first occurrence of maximum over requested axis.

DataFrame.idxmin([axis, skipna])

Return index of first occurrence of minimum over requested axis.

DataFrame.reindex([labels, index, columns, ...])

Conform Series/DataFrame to new index with optional filling logic.

DataFrame.reindex_like(other[, method, ...])

Return an object with matching indices as other object.

DataFrame.rename([mapper, index, columns, ...])

Alter axes labels.

DataFrame.rename_axis([mapper, index, ...])

Set the name of the axis for the index or columns.

DataFrame.reset_index([level, drop, ...])

Reset the index, or a level of it.

DataFrame.sample([n, frac, replace, ...])

Return a random sample of items from an axis of object.

DataFrame.set_axis(labels[, axis, inplace])

Assign desired index to given axis.

DataFrame.set_index(keys[, drop, append, ...])

Set the DataFrame index using existing columns.

DataFrame.take(indices[, axis])

Return the elements in the given positional indices along an axis.

DataFrame.truncate([before, after, axis, copy])

Truncate a Series or DataFrame before and after some index value.

Missing data handling#

DataFrame.dropna([axis, how, thresh, ...])

Remove missing values.

DataFrame.fillna([value, method, axis, ...])

Fill NA/NaN values using the specified method.

DataFrame.isna()

Detect missing values.

DataFrame.isnull()

Detect missing values.

DataFrame.notna()

Detect existing (non-missing) values.

DataFrame.notnull()

Detect existing (non-missing) values.

Reshaping, sorting, transposing#

DataFrame.melt([id_vars, value_vars, ...])

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

DataFrame.nlargest(n, columns[, keep])

Return the first n rows ordered by columns in descending order.

DataFrame.nsmallest(n, columns[, keep])

Return the first n rows ordered by columns in ascending order.

DataFrame.pivot(columns[, index, values])

Return reshaped DataFrame organized by given index / column values.

DataFrame.pivot_table([values, index, ...])

Create a spreadsheet-style pivot table as a DataFrame.

DataFrame.reorder_levels(order[, axis])

Rearrange index levels using input order.

DataFrame.sort_values(by[, axis, ascending, ...])

Sort by the values along either axis.

DataFrame.sort_index([axis, level, ...])

Sort object by labels (along an axis).

DataFrame.swaplevel([i, j, axis])

Swap levels i and j in a MultiIndex.

DataFrame.stack([level, dropna])

Stack the prescribed level(s) from columns to index.

DataFrame.unstack([level, fill_value])

Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.

Combining / comparing / joining / merging#

DataFrame.append(other[, ignore_index, ...])

Append rows of other to the end of caller, returning a new object.

DataFrame.assign(**kwargs)

Assign new columns to a DataFrame.

DataFrame.compare(other[, align_axis, ...])

Compare to another DataFrame and show the differences.

DataFrame.join(other[, on, how, lsuffix, ...])

Join columns of another DataFrame.

DataFrame.merge(right[, how, on, left_on, ...])

Merge DataFrame or named Series objects with a database-style join.

DataFrame.update(other[, join, overwrite, ...])

Modify in place using non-NA values from another DataFrame.

Plotting#

DataFrame.plot is both a callable method and a namespace attribute for specific plotting methods of the form DataFrame.plot.<kind>.

DataFrame.plot.area(*args, **kwargs)

Draw a stacked area plot.

DataFrame.plot.bar(*args, **kwargs)

Vertical bar plot.

DataFrame.plot.barh(*args, **kwargs)

Make a horizontal bar plot.

DataFrame.plot.box(*args, **kwargs)

Make a box plot of the DataFrame columns.

DataFrame.plot.density(*args, **kwargs)

Generate Kernel Density Estimate plot using Gaussian kernels.

DataFrame.plot.hexbin(*args, **kwargs)

Generate a hexagonal binning plot.

DataFrame.plot.hist(*args, **kwargs)

Draw one histogram of the DataFrame's columns.

DataFrame.plot.kde(*args, **kwargs)

Generate Kernel Density Estimate plot using Gaussian kernels.

DataFrame.plot.line(*args, **kwargs)

Plot Series or DataFrame as lines.

DataFrame.plot.pie(*args, **kwargs)

Generate a pie plot.

DataFrame.plot.scatter(*args, **kwargs)

Create a scatter plot with varying marker point size and color.

Serialization / IO / conversion#

DataFrame.from_dict(data[, orient, dtype, ...])

Construct DataFrame from dict of array-like or dicts.

DataFrame.from_records(data[, index, ...])

Convert structured or record ndarray to DataFrame.

DataFrame.to_clipboard(*[, excel, sep, ...])

Copy object to the system clipboard.

DataFrame.to_csv(path[, sep, na_rep, ...])

Write object to a comma-separated values (csv) file.

DataFrame.to_dict([orient, into, index, ...])

Convert the DataFrame to a dictionary.

DataFrame.to_json([path, orient, ...])

Convert the object to a JSON string.

DataFrame.to_odps_table(table[, partition, ...])

Write DataFrame object into a MaxCompute (ODPS) table.

DataFrame.to_pandas([session])

DataFrame.to_parquet(path[, engine, ...])

Write a DataFrame to the binary parquet format, each chunk will be written to a Parquet file.

MaxFrame Extensions#

DataFrame.mf.apply_chunk(func[, batch_rows, ...])

Apply a function that takes pandas DataFrame and outputs pandas DataFrame/Series.

DataFrame.mf.collect_kv([columns, kv_delim, ...])

Merge values in specified columns into a key-value represented column.

DataFrame.mf.extract_kv([columns, kv_delim, ...])

Extract values in key-value represented columns into standalone columns.

DataFrame.mf.flatmap(func[, dtypes, raw, args])

Apply the given function to each row and then flatten results.

DataFrame.mf.map_reduce([mapper, reducer, ...])

Map-reduce API over certain DataFrames.

DataFrame.mf.rebalance([axis, factor, ...])

Make data more balanced across entire cluster.

DataFrame.mf.reshuffle([group_by, sort_by, ...])

Shuffle data in DataFrame or Series to make data distribution more randomized.

DataFrame.mf provides methods unique to MaxFrame. These methods are collated from application scenarios in MaxCompute and these can be accessed like DataFrame.mf.<function/property>.