Describe the enhancement requested
CorruptStatistics.shouldIgnoreStatistics(String createdBy, PrimitiveTypeName columnType) performs VersionParser.parse() and SemanticVersion.parse() on every invocation. The createdBy string is constant per file (from FileMetaData.created_by), but this method is called once per column chunk per row group during ParquetMetadataConverter.fromParquetMetadata().
For a file with R row groups and C columns, the version string is parsed R×C times during file open — all yielding the same result.
Impact
High CPU during ParquetFileReader construction on files with many high groups/columns.
Proposed fix
Compute shouldIgnoreStatistics once per file before the row group loop in fromParquetMetadata, and pass the pre-computed boolean through buildColumnChunkMetaData → fromParquetStatisticsInternal.
Since buildColumnChunkMetaData and fromParquetStatisticsInternal are public/package-level API methods enforced by japicmp-maven-plugin, we will add overloaded methods that accept a boolean shouldIgnoreBinaryStats parameter rather than changing existing signatures. The existing String createdBy signatures remain for backward compatibility and delegate to the new overloads.
The page-level path in ParquetFileReader.Chunk.readAllPages() also calls shouldIgnoreStatistics via fromParquetStatisticsInternal, but is not in scope for this fix — at read time the cost is masked by I/O, decompression, and decoding. The footer path during file open is where R×C calls happen in a tight loop with no I/O to amortize the cost.
Profiling data
shouldIgnoreStatistics accounts for ~65% of fromParquetMetadata CPU time across multiple samples.

Component(s)
Core
Describe the enhancement requested
CorruptStatistics.shouldIgnoreStatistics(String createdBy, PrimitiveTypeName columnType)performsVersionParser.parse()andSemanticVersion.parse()on every invocation. ThecreatedBystring is constant per file (fromFileMetaData.created_by), but this method is called once per column chunk per row group duringParquetMetadataConverter.fromParquetMetadata().For a file with R row groups and C columns, the version string is parsed R×C times during file open — all yielding the same result.
Impact
High CPU during
ParquetFileReaderconstruction on files with many high groups/columns.Proposed fix
Compute
shouldIgnoreStatisticsonce per file before the row group loop infromParquetMetadata, and pass the pre-computed boolean throughbuildColumnChunkMetaData→fromParquetStatisticsInternal.Since
buildColumnChunkMetaDataandfromParquetStatisticsInternalare public/package-level API methods enforced by japicmp-maven-plugin, we will add overloaded methods that accept a booleanshouldIgnoreBinaryStatsparameter rather than changing existing signatures. The existing StringcreatedBysignatures remain for backward compatibility and delegate to the new overloads.The page-level path in
ParquetFileReader.Chunk.readAllPages()also callsshouldIgnoreStatisticsviafromParquetStatisticsInternal, but is not in scope for this fix — at read time the cost is masked by I/O, decompression, and decoding. The footer path during file open is where R×C calls happen in a tight loop with no I/O to amortize the cost.Profiling data

shouldIgnoreStatisticsaccounts for ~65% offromParquetMetadataCPU time across multiple samples.Component(s)
Core