maxframe.dataframe.DataFrame.to_lance#

DataFrame.to_lance(path, mode: str = 'create', index: bool = None, index_label=None, schema=None, storage_options: dict = None, **kwargs)#

Write a DataFrame to a Lance dataset.

Lance is a columnar data format optimized for ML workloads and vector search. Each chunk will be written as a fragment.

Parameters:
  • path (str) – Target path for the Lance dataset. Can be a local path, Aliyun OSS URL, or S3 URL. For Aliyun OSS URLs, the format is: oss://<endpoint>/<bucket>/<path>. Example: oss://oss-cn-beijing.aliyuncs.com/my-bucket/dataset. For S3 URLs, the format is: s3://<bucket>/<path>.

  • mode ({'create', 'append', 'overwrite'}, default 'create') – How to handle existing data: - ‘create’: Create a new dataset (fails if exists) - ‘append’: Append to existing dataset - ‘overwrite’: Overwrite existing dataset

  • index (bool, optional) – If True, write DataFrame index as a column. If False, do not write the index. If None (default), write the index only if it’s not a simple RangeIndex (same as pa.Table.from_pandas default behavior).

  • index_label (str or list of str, optional) – Column label(s) for the index column(s). If None (default) and index is True, the index names are used. Use this to rename the index column when writing (e.g., from ‘_idx_0’ to ‘id’).

  • schema (pyarrow.Schema, optional) – PyArrow schema to specify column types, compression, encoding, etc. Columns in this schema will override auto-detected types from DataFrame. Columns not specified will use types inferred from DataFrame.

  • storage_options (dict, optional) – Options for storage connection. For Aliyun OSS with RAM role: {'role_arn': 'acs:ram::xxx:role/name'}

  • **kwargs – Additional keyword arguments passed to lance.fragment.write_fragments().

Returns:

An empty DataFrame (write operation result).

Return type:

DataFrame

Examples

>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> # Write to Aliyun OSS with RAM role
>>> df.to_lance(
...     'oss://oss-cn-beijing.aliyuncs.com/my-bucket/dataset',
...     storage_options={'role_arn': 'acs:ram::1234567890:role/maxframe-oss'}
... ).execute()
>>> # Append mode
>>> df.to_lance(
...     'oss://oss-cn-beijing.aliyuncs.com/my-bucket/dataset',
...     mode='append',
...     storage_options={'role_arn': 'acs:ram::1234567890:role/maxframe-oss'}
... ).execute()
>>> # Overwrite mode
>>> df.to_lance(
...     'oss://oss-cn-beijing.aliyuncs.com/my-bucket/dataset',
...     mode='overwrite',
...     storage_options={'role_arn': 'acs:ram::1234567890:role/maxframe-oss'}
... ).execute()