Skip to content

[FLINK-39724][python] Support InternalTypeInfo in PyFlink type extraction#28490

Open
wilmerdooley wants to merge 1 commit into
apache:masterfrom
wilmerdooley:oss/flink-39724
Open

[FLINK-39724][python] Support InternalTypeInfo in PyFlink type extraction#28490
wilmerdooley wants to merge 1 commit into
apache:masterfrom
wilmerdooley:oss/flink-39724

Conversation

@wilmerdooley

@wilmerdooley wilmerdooley commented Jun 19, 2026

Copy link
Copy Markdown

What is the purpose of the change

When a PyFlink DataStream is produced by sources such as CsvReaderFormat, its underlying Java TypeInformation is an InternalTypeInfo wrapping a RowData logical type. The _from_java_type helper in flink-python did not recognize this class, so calls like DataStream.get_type() and DataStream.assign_timestamps_and_watermarks() raised TypeError: The java type info: ... is not supported in PyFlink currently., forcing users to insert an identity map as a workaround.

Brief change log

  • flink-python/pyflink/common/typeinfo.py: in _from_java_type, detect InternalTypeInfo, convert it back to its logical type, then run it through LogicalTypeDataTypeConverter and LegacyTypeInfoDataTypeConverter to obtain a legacy TypeInformation that maps to a PyFlink type.

Verifying this change

This change added a unit test: test_internal_type_info in flink-python/pyflink/common/tests/test_typeinfo.py builds an InternalTypeInfo over a RowType(VARCHAR, DOUBLE) and asserts _from_java_type round-trips it to Types.ROW([Types.STRING(), Types.DOUBLE()]). The test fails on the pre-fix code, which raised TypeError.

Does this pull request potentially affect one of the following parts

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no (the change is confined to the private _from_java_type helper in PyFlink)
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no (this runs once during type extraction at stream construction, not per record)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: Claude Code

JIRA: https://issues.apache.org/jira/browse/FLINK-39724

…tion

When a PyFlink DataStream is produced by sources such as CsvReaderFormat,
its underlying Java TypeInformation is an InternalTypeInfo wrapping a
RowData logical type. _from_java_type did not recognize this class, so
DataStream.get_type() and assign_timestamps_and_watermarks() raised
TypeError. This adds a branch in _from_java_type that converts
InternalTypeInfo through its logical type back to a legacy TypeInformation
that maps to a PyFlink type, with a unit test covering the round-trip from
InternalTypeInfo to Types.ROW(...).

Signed-off-by: wilmerdooley <wilmerdooley1@gmail.com>
Generated-by: Claude Code
@flinkbot

flinkbot commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@wilmerdooley wilmerdooley changed the title FLINK-39724: fix(python): support InternalTypeInfo in _from_java_type [FLINK-39724][python] Support InternalTypeInfo in PyFlink type extraction Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants