examples(polars): Polars × PyMEOS TemporalParquet round-trip example (depends on PyMEOS #84)#6
Open
estebanzimanyi wants to merge 1 commit into
Conversation
Adds PyMEOS_Examples/Polars_TemporalParquet.py demonstrating the zero-copy bridge between PyMEOS' data-lake interchange layer (`pymeos.io`) and the Polars DataFrame engine. Round-trip covered: 1. Build a temporal-point dataset using PyMEOS (3 trips, 4 instants each) 2. Write to TemporalParquet via `pymeos.io.write_temporal` — opaque MEOS-WKB payload + native-scalar sidecar columns + self-describing `temporal` footer (byte-compatible with MobilityDuck's `temporalFooter()` consumer recipe) 3. Read back with PyMEOS — full PyMEOS object reconstruction 4. Consume the SAME file in Polars zero-copy via `pl.from_arrow` — Polars sees sidecar columns as native primitives 5. Sidecar-driven predicate pushdown via `pyarrow.parquet.read_table` `filters=[…]` — row-groups pruned before any per-row decode Depends on the `pymeos.io` module shipping in PyMEOS PR #84 (`feat/datalake-consumer`). Until #84 reaches PyMEOS master, adopters install PyMEOS from the branch directly: pip install "git+https://github.com/MobilityDB/PyMEOS.git@feat/datalake-consumer#egg=pymeos[parquet]" After #84 merges, the standard `pip install pymeos[parquet]` path works without code changes. README updated to index the new example with the install caveat.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a worked-out example of consuming TemporalParquet files from the Polars DataFrame engine, zero-copy via PyMEOS'
pymeos.iodata-lake interchange layer.What's in the example
PyMEOS_Examples/Polars_TemporalParquet.py— a single self-contained script demonstrating the full round-trip:pymeos.io.write_temporal— opaque MEOS-WKB payload column + native-scalar sidecar columns (<col>__xmin/xmax/ymin/ymax/tmin/tmax) + self-describingtemporalfooter in the Parquet schema metadata. Byte-compatible with files written by MobilityDuck'stemporalFooter()consumer recipe — files are portable across both tools.TGeomPointSeqobject reconstruction.pl.from_arrow(pyarrow.parquet.read_table(path)). Polars sees the sidecar columns as native primitives, so its lazy / predicate-pushdown machinery works without decoding the MEOS-WKB payload. The temporal column appears as opaqueBINARYfor analysts who don't need MEOS-aware operations on every column.pyarrow.parquet.read_table(filters=[("trip__xmax", "<", 4.45)])prunes row groups before any per-row decode.Example shows the dual consumption model that motivates the data-lake layer: PyMEOS for MEOS-aware reads, Polars (or any Arrow-aware engine) for native-column analytics, both reading the same on-disk file.
Install caveat
The
pymeos.iomodule ships in PyMEOS PR #84 (feat/datalake-consumer, OPEN at time of writing). Until #84 merges into PyMEOS master, install PyMEOS from the branch directly:pip install "git+https://github.com/MobilityDB/PyMEOS.git@feat/datalake-consumer#egg=pymeos[parquet]" pip install polars pyarrowAfter #84 merges, the standard install path works with zero code change:
pip install "pymeos[parquet]" polars pyarrowThe script itself doesn't reference any branch-specific path — only
pymeos.io, which is the stable public surface in PR #84.Why this PR lands now rather than after #84 merges
Two reasons:
pymeos.io's public surface is genuinely Polars-compatible. Writing it now surfaces any contract gaps while PR #84 is still in review (the script usesto_arrow,from_arrow,write_temporal,read_temporal,temporal_footer— the full public surface).The README's install instruction is explicit about the dependency, so users hitting the example before #84 lands aren't surprised.
File checklist
PyMEOS_Examples/Polars_TemporalParquet.py— the example script (~200 lines, single file, no other deps)README.md— one new bullet indexing the example with the install caveatWhat's NOT in scope here
pl.scan_iceberg, but that requires a live Iceberg catalog (e.g. Apache Polaris) and is gated on the MobilityDucktemporal_iceberg_scanUDF. Tracked separately periceberg-readinessmemo; an Iceberg-Polars composition example would land as a sibling here once those substrates exist.scan_pyarrow_dataset— for multi-file Parquet datasets, but adds complexity without changing the conceptual round-trip. Easy follow-up once adopters ask for it.