[Python] Expose Expression.field_refs() to enumerate referenced fields

### Describe the enhancement requested

`pyarrow.compute.Expression` has no Python-accessible way to enumerate the
fields it references. The C++ side already exposes the underlying primitive
([`arrow::compute::FieldsInExpression`](https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/expression.h#L174)),
but the Python `Expression` class only surfaces `cast`, `equals`, `is_null`,
`is_nan`, `is_valid`, `isin`, and Substrait round-trip. Every downstream tool
that needs the column set of a predicate today either:

1. Regex-parses `str(expression)` (fragile — quoted string literals and
   keywords like `and` leak into the result).
2. Serializes to Substrait via `to_substrait(schema)` and walks the protobuf
   (heavy — requires a bound schema and a substrait dependency just to ask
   "which columns?").
3. Maintains a parallel AST upstream of `pc.Expression`, like
   [Ray Data's `_PyArrowExpressionVisitor`](https://docs.ray.io/en/releases-2.54.1/_modules/ray/data/expressions.html).

Exposing the existing C++ primitive removes all three workarounds.

### Motivating use cases

The recurring shape is: a library or end user has a `pc.Expression` in hand
and needs to decide **which columns to read off disk** before evaluating it.

1. **Column projection on cold storage.** Wrapping `pyarrow.dataset.Scanner`
   or `pyiceberg.Table.scan(...)` with a user-supplied filter — the wrapper
   wants to set `selected_fields = user_projection ∪ filter_refs` to avoid
   pulling unused columns off S3 / disk.
2. **Conditional MERGE / upsert on Iceberg.** PyIceberg's `Table.upsert`
   currently has no `when_matched_condition` parameter
   ([apache/iceberg-python#1534](https://github.com/apache/iceberg-python/pull/1534)
   explicitly scoped to "when matched update all / when not matched insert
   all" and directed users to Spark for predicate-based MERGE). Implementing
   a conditional upsert in Python requires projecting only the destination
   columns the predicate touches before joining and filtering — which needs
   field-ref introspection.
3. **Predicate splitting across two sources.** Any library that accepts a
   single user-facing predicate and routes it across a join (source ↔ target,
   stream ↔ table, etc.) needs to bucket field references by side.
4. **Ray Data, delta-rs, Lance.** Cross-engine routers that translate
   `pc.Expression` to a non-Arrow execution engine all start with the same
   question — which fields does it touch? — to decide which engine knows
   about which columns and which side of a join to push the filter on.

### Prior discussion

Comment thread on the closed
[#27160 [Python] Allow to create field reference to nested field](https://github.com/apache/arrow/issues/27160)
records this as a known gap that was never tracked:

> bkietz:
> "currently field_refs can only extract a field from the scanned dataset.
> It'd be helpful if they could also extract a field from an Expression."
>
> nealrichardson:
> "Agree that it would be helpful (possibly necessary) to be able to extract
> a field from an Expression more generally."

That thread closed on the inverse direction (constructing nested refs);
this issue tracks the missing direction.

### Proposed API

```python
def field_refs(self) -> list[str | int | tuple[str | int, ...]]:
    """
    Return the field references contained in this expression.

    Each reference is reported once per call site (matches the C++
    `FieldsInExpression` semantics). The returned value shape mirrors
    `pyarrow.compute.field()`'s input — by-name references come back as
    `str`, by-index as `int`, and nested references as `tuple`.
    """
```

Round-trip example:

```python
>>> import pyarrow.compute as pc
>>> ((pc.field("a") > 0) & pc.field("b").is_null()).field_refs()
['a', 'b']
>>> pc.field("user", "city").field_refs()
[('user', 'city')]
>>> pc.scalar(5).field_refs()
[]
```

### Open API decisions to settle before implementation

| Decision | Proposed | Rationale |
|---|---|---|
| Method name | `field_refs()` | Mirrors C++ free function `FieldsInExpression` and the existing singular accessor `Expression.field_ref()`. Alternatives: `references()`, `referenced_fields()`. |
| Return type | `list[str \| int \| tuple]` | Round-trip compatible with `pc.field(*ref)`. Avoids introducing a new public `FieldRef` Python type (which would deserve its own design discussion — likely a follow-up). |
| Dedup | No | Matches C++ `FieldsInExpression`. Callers do `set(...)` if desired. |
| Order | Traversal (left-to-right, depth-first) | Documented as "not part of the public contract" to leave room. |
| Single-element FieldPath | Plain `int`, not `(int,)` | Symmetric with `pc.field(3)` returning a non-nested ref. |

Happy to defer any of these to maintainer preference.

### Implementation outline

Small (~80 lines including tests). Three files touched:

- `python/pyarrow/includes/libarrow.pxd` — declare `FieldsInExpression` and
  the additional `CFieldRef` accessors (`IsName`, `IsFieldPath`, `IsNested`,
  `field_path`, `nested_refs`) needed for the conversion helper.
- `python/pyarrow/_compute.pyx` — add a `_fieldref_to_python` helper and a
  `field_refs()` method on `Expression`. Both small.
- `python/pyarrow/tests/test_compute.py` — coverage for the four FieldRef
  shapes (name / index / nested name / nested index), empty (constant
  expression), and round-trip through `pc.field()`.

Plus one autosummary line in `docs/source/python/api/compute.rst`.

I'm happy to put up a PR once the API is agreed.

### Related issues

- [#27160](https://github.com/apache/arrow/issues/27160) — closed; this
  issue captures the unfiled follow-up.
- [#34433](https://github.com/apache/arrow/issues/34433) — adjacent;
  asks for `table.evaluate(expr)` returning a boolean mask. Both are
  "more handles on `Expression`" requests but distinct in scope.
- [#49885](https://github.com/apache/arrow/issues/49885) — adjacent;
  binding unresolved Substrait expressions. Complementary work on the
  Expression API.

### Component(s)

Python


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Expose Expression.field_refs() to enumerate referenced fields #50031

Describe the enhancement requested

Motivating use cases

Prior discussion

Proposed API

Open API decisions to settle before implementation

Implementation outline

Related issues

Component(s)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Decision	Proposed	Rationale
Method name	`field_refs()`	Mirrors C++ free function `FieldsInExpression` and the existing singular accessor `Expression.field_ref()`. Alternatives: `references()`, `referenced_fields()`.
Return type	`list[str \| int \| tuple]`	Round-trip compatible with `pc.field(*ref)`. Avoids introducing a new public `FieldRef` Python type (which would deserve its own design discussion — likely a follow-up).
Dedup	No	Matches C++ `FieldsInExpression`. Callers do `set(...)` if desired.
Order	Traversal (left-to-right, depth-first)	Documented as "not part of the public contract" to leave room.
Single-element FieldPath	Plain `int`, not `(int,)`	Symmetric with `pc.field(3)` returning a non-nested ref.

[Python] Expose Expression.field_refs() to enumerate referenced fields #50031

Description

Describe the enhancement requested

Motivating use cases

Prior discussion

Proposed API

Open API decisions to settle before implementation

Implementation outline

Related issues

Component(s)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions