maxframe.dataframe.read_odps_table#
- maxframe.dataframe.read_odps_table(table_name: str | Table, partitions: None | str | List[str] = None, columns: List[str] | None = None, index_col: None | str | List[str] = None, *, odps_entry: ODPS = None, string_as_binary: bool = None, append_partitions: bool = False, dtype_backend: str = None, default_index_type: DefaultIndexType = None, filters: str | List[List[Tuple]] = None, **kw)[source]#
Read data from a MaxCompute (ODPS) table into DataFrame.
Supports specifying some columns as indexes. If not specified, RangeIndex will be generated.
- Parameters:
table_name (Union[str, Table]) – Name of the table to read from.
partitions (Union[None, str, List[str]]) – Table partition or list of partitions to read from.
columns (Optional[List[str]]) – Table columns to read from. You may also specify partition columns here. If not specified, all table columns (or include partition columns if append_partitions is True) will be included.
index_col (Union[None, str, List[str]]) – Columns to be specified as indexes.
append_partitions (bool) – If True, will add all partition columns as selected columns when columns is not specified,
dtype_backend ({'numpy', 'pyarrow'}, default 'numpy') – Back-end data type applied to the resultant DataFrame (still experimental).
filters (Union[str, List[List[Tuple]]], default None) –
Filter expression to apply when reading data.
String format: SQL WHERE clause passed directly to StorageAPI.
List format: Nested list of tuples.
Format: Inner lists are ANDed, outer lists are ORed. Example:
[[('col1', '==', 'value'), ('col2', '>', 10)]]
Supported operators:
==,!=,<,>,<=,>=,in,not in.Note
Complete filtering is not guaranteed for this argument given implementation.
- Returns:
result – DataFrame read from MaxCompute (ODPS) table
- Return type:
Examples
Before using read_odps_table, you need to create an ODPS entry whose parameters will be stored globally in current process.
>>> import maxframe.dataframe as md >>> from odps import ODPS >>> >>> o = ODPS(...) # Fill account information here
Simply read a table by name.
>>> df = md.read_odps_table("simple_table")
Read table by partition (or partitions).
>>> # Read partitioned table >>> df = md.read_odps_table("partitioned_table", partitions="pt=20230101") >>> # Read with multiple partitions >>> df = md.read_odps_table("partitioned_table", partitions=["pt=20230101", "pt=20230102"])
Read with column selection.
>>> df = md.read_odps_table("table_name", columns=["col1", "col2", "col3"])
Read with columns as index.
>>> # Read with index columns >>> df = md.read_odps_table("table_name", index_col="id") >>> # Read with multiple index columns >>> df = md.read_odps_table("table_name", index_col=["id", "timestamp"])
Read with filter condition. Note that complete filtering is not guaranteed.
>>> # Read table with string filter >>> df = md.read_odps_table("source_table", filters="age > 18") >>> # Read with list filter >>> df = md.read_odps_table( ... "table_name", ... filters=[[('age', '>', 18), ('city', '==', 'Beijing')]] ... )