添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Description

In the Arrow project, we recently created a new protocol for sharing Arrow data in Python . One of the goals of the protocol is allow exporting / importing Arrow data in Python without having to necessarily use PyArrow as an intermediary. For example, DuckDB can read from Polars DataFrames and LazyFrames, but only if PyArrow is installed. One this protocol is implemented, it would be possible to accomplish that integration without PyArrow.

This allows Arrow-exportable objects to be recognized based on the presence of one of several dunder methods.

Polars could implement this in two ways:

  • Add Arrow PyCapsule dunder methods to Polars objects
  • That would be: DataFrame , Series , DataType
  • Support Arrow PyCapsules in polars.from_arrow
  • polars.from_arrow
  • Support Arrow PyCapsules in polars.DataFrame constructor
  • You already support pd.DataFrame , so it would make logical sense to support reading rectangular-shaped Arrow data.
  • I'd be happy to contribute this to the repo, if these ideas sound good.

    jorisvandenbossche, pitrou, kylebarron, WillAyd, s-b90, and jonmmease reacted with thumbs up emoji ion-elgreco and kylebarron reacted with rocket emoji All reactions

    Sorry for the delay. Somehow I missed this. I think this sounds great. Being agnostic to arrow consumer without hard pyarrow dependency sound good.

    Does your offer still stand on this?

    This is mostly resolved by #17676 , #17693 , and #17935 . Potential follow ups include:

  • Implementing the interface for Schema and DataType objects
  • Casting exported data according to requested_schema
  •