Skip to content

[python] Support query auth (row filter & column masking) for REST catalog#8136

Open
MgjLLL wants to merge 1 commit into
apache:masterfrom
MgjLLL:python-query-auth
Open

[python] Support query auth (row filter & column masking) for REST catalog#8136
MgjLLL wants to merge 1 commit into
apache:masterfrom
MgjLLL:python-query-auth

Conversation

@MgjLLL
Copy link
Copy Markdown

@MgjLLL MgjLLL commented Jun 5, 2026

Purpose

Adds query-auth support to the Python client so it honors the row-level filter and column masking rules returned by a REST catalog, matching the existing JVM client behavior.

When the new option query-auth.enabled is set to true, before producing a Plan the client calls POST /v1/.../databases/{db}/tables/{tb}/auth with the projected fields, receives { filter, columnMasking }, and applies them on the read path:

  • RESTApi.auth_table_query issues the call (new request/response models AuthTableQueryRequest / AuthTableQueryResponse, new path in ResourcePaths.auth_table).
  • TableQueryAuth / TableQueryAuthResult (catalog/table_query_auth.py) wrap the result and convert each split to a QueryAuthSplit.
  • predicate_json_parser (common/predicate_json_parser.py) parses Paimon predicate JSON into a PyArrow compute filter (EQ/NEQ/LT/LTEQ/GT/GTEQ/IS_NULL/IS_NOT_NULL/IN/NOT_IN/STARTS_WITH/ENDS_WITH/CONTAINS/AND/OR/NOT).
  • AuthFilterReader / AuthMaskingReader / ColumnProjectReader (read/reader/auth_masking_reader.py) implement row filtering, column masking transforms (NULL, FIELD_REF, CAST, UPPER, LOWER, CONCAT, CONCAT_WS) and final projection back to the user's requested columns.
  • read_builder / stream_read_builder / table_read / table_scan / file_store_table / catalog_environment / rest_catalog are wired to invoke the auth call and pull extra fields required only by the auth filter.

Behavior is gated by the new CoreOptions.QUERY_AUTH_ENABLED (query-auth.enabled, default false), so existing users see no change.

Tests

Three new test files (994+ lines, all passing locally under pytest):

  • paimon-python/pypaimon/tests/predicate_json_parser_test.py — covers each predicate kind, nested AND/OR/NOT, type coercion, null handling, and extract_referenced_fields.
  • paimon-python/pypaimon/tests/auth_masking_reader_test.py — covers each masking transform, missing-field validation, and projection back to the user-requested columns.
  • paimon-python/pypaimon/tests/table_query_auth_test.py — end-to-end coverage: REST catalog calls auth_table_query, the result is plumbed into the plan, splits become QueryAuthSplit, and reads return filtered + masked rows.

Local check:

cd paimon-python
python -m pytest pypaimon/tests/predicate_json_parser_test.py \
                  pypaimon/tests/auth_masking_reader_test.py \
                  pypaimon/tests/table_query_auth_test.py -q
flake8 --config dev/cfg.ini pypaimon/  # 已在改动范围内通过

API and Format

  • New catalog option: query-auth.enabled (boolean, default false).
  • New REST endpoint consumed by the client: POST /v1/{prefix}/databases/{db}/tables/{tb}/auth. Request { "select": [...] }, response { "filter": [<predicate-json>...], "columnMasking": { <col>: <transform-json>, ... } }. The contract follows the existing Java client; no server-side change is required for catalogs that already implement query auth.
  • No change to existing user-facing Python APIs. New types (AuthTableQueryRequest, AuthTableQueryResponse, TableQueryAuth, TableQueryAuthResult, QueryAuthSplit, AuthFilterReader, AuthMaskingReader, ColumnProjectReader) are additive and live under existing modules.
  • File format / on-disk layout: unchanged.

Documentation

The new option query-auth.enabled should be reflected in the Python configuration reference. Happy to add the docs entry in this PR or in a follow-up — please advise.

This closes #8135

…talog

Adds query-auth support to the Python client so it honors the row-level
filter and column masking rules returned by a REST catalog, matching the
existing JVM client behavior.

When the new option `query-auth.enabled` is set to true, the client
calls `POST /v1/.../databases/{db}/tables/{tb}/auth` before producing a
plan, receives `{ filter, columnMasking }`, and applies them on the
read path:

  * `predicate_json_parser` parses Paimon predicate JSON into a
    PyArrow compute filter (EQ/NEQ/LT/LTEQ/GT/GTEQ/IS_NULL/IS_NOT_NULL/
    IN/NOT_IN/STARTS_WITH/ENDS_WITH/CONTAINS/AND/OR/NOT).
  * `AuthFilterReader` / `AuthMaskingReader` / `ColumnProjectReader`
    perform row filtering, column masking transforms (NULL, FIELD_REF,
    CAST, UPPER, LOWER, CONCAT, CONCAT_WS) and final projection back to
    the user's requested columns.
  * `TableQueryAuth` / `TableQueryAuthResult` wrap the result and
    convert each split to a `QueryAuthSplit`.

Behavior is gated by `CoreOptions.QUERY_AUTH_ENABLED` (default false),
so existing users see no change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] [python] Support query auth (row filter & column masking) for REST catalog

1 participant