maxframe.dataframe.DataFrame.to_lance#
- DataFrame.to_lance(path, mode: str = 'create', index: bool = None, index_label=None, schema=None, storage_options: dict = None, **kwargs)#
Write a DataFrame to a Lance dataset.
Lance is a columnar data format optimized for ML workloads and vector search. Each chunk will be written as a fragment.
- Parameters:
path (str) – Target path for the Lance dataset. Can be a local path, Aliyun OSS URL, or S3 URL. For Aliyun OSS URLs, the format is:
oss://<endpoint>/<bucket>/<path>. Example:oss://oss-cn-beijing.aliyuncs.com/my-bucket/dataset. For S3 URLs, the format is:s3://<bucket>/<path>.mode ({'create', 'append', 'overwrite'}, default 'create') – How to handle existing data: - ‘create’: Create a new dataset (fails if exists) - ‘append’: Append to existing dataset - ‘overwrite’: Overwrite existing dataset
index (bool, optional) – If True, write DataFrame index as a column. If False, do not write the index. If None (default), write the index only if it’s not a simple RangeIndex (same as
pa.Table.from_pandasdefault behavior).index_label (str or list of str, optional) – Column label(s) for the index column(s). If None (default) and index is True, the index names are used. Use this to rename the index column when writing (e.g., from ‘_idx_0’ to ‘id’).
schema (pyarrow.Schema, optional) – PyArrow schema to specify column types, compression, encoding, etc. Columns in this schema will override auto-detected types from DataFrame. Columns not specified will use types inferred from DataFrame.
storage_options (dict, optional) – Options for storage connection. For Aliyun OSS with RAM role:
{'role_arn': 'acs:ram::xxx:role/name'}**kwargs – Additional keyword arguments passed to lance.fragment.write_fragments().
- Returns:
An empty DataFrame (write operation result).
- Return type:
Examples
>>> import maxframe.dataframe as md >>> df = md.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) >>> # Write to Aliyun OSS with RAM role >>> df.to_lance( ... 'oss://oss-cn-beijing.aliyuncs.com/my-bucket/dataset', ... storage_options={'role_arn': 'acs:ram::1234567890:role/maxframe-oss'} ... ).execute() >>> # Append mode >>> df.to_lance( ... 'oss://oss-cn-beijing.aliyuncs.com/my-bucket/dataset', ... mode='append', ... storage_options={'role_arn': 'acs:ram::1234567890:role/maxframe-oss'} ... ).execute() >>> # Overwrite mode >>> df.to_lance( ... 'oss://oss-cn-beijing.aliyuncs.com/my-bucket/dataset', ... mode='overwrite', ... storage_options={'role_arn': 'acs:ram::1234567890:role/maxframe-oss'} ... ).execute()