Skip to content

[BWARE] Speed up frame-to-matrix conversion and harden number parsing#2480

Open
Baunsgaard wants to merge 2 commits into
apache:mainfrom
Baunsgaard:split/doubleParser
Open

[BWARE] Speed up frame-to-matrix conversion and harden number parsing#2480
Baunsgaard wants to merge 2 commits into
apache:mainfrom
Baunsgaard:split/doubleParser

Conversation

@Baunsgaard

Copy link
Copy Markdown
Contributor

Tightens the hot path that converts FrameBlocks of arbitrary schema into MatrixBlocks, with a defensive fallback for malformed cells.

  • DoubleParser.parseFloatingPointLiteral: replace the >= 'I' / >= 'a' character guards with a single 0-9 range check on the last char. The previous guards over-matched and pushed too many strings into the slow Double.parseDouble path
  • DoubleArray.parseDouble: stop wrapping the parse failure as a DMLRuntimeException so callers can distinguish format errors
  • MatrixBlockFromFrame:
    • turn the interface into a class with a private constructor so Jacoco can measure it cleanly
    • on NumberFormatException / DMLRuntimeException during a bulk block convert, log once and fall back to convertSafeCast which writes NaN per offending cell instead of failing the whole job
    • add convertSafeCast / convertBlockSafeCast helpers

Tightens the hot path that converts FrameBlocks of arbitrary schema
into MatrixBlocks, with a defensive fallback for malformed cells.

- DoubleParser.parseFloatingPointLiteral: replace the >= 'I' / >= 'a'
  character guards with a single 0-9 range check on the last char.
  The previous guards over-matched and pushed too many strings into
  the slow Double.parseDouble path
- DoubleArray.parseDouble: stop wrapping the parse failure as a
  DMLRuntimeException so callers can distinguish format errors
- MatrixBlockFromFrame:
    - turn the interface into a class with a private constructor so
      Jacoco can measure it cleanly
    - on NumberFormatException / DMLRuntimeException during a bulk
      block convert, log once and fall back to convertSafeCast which
      writes NaN per offending cell instead of failing the whole job
    - add convertSafeCast / convertBlockSafeCast helpers
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.73684% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.40%. Comparing base (88c26e2) to head (759aacc).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...s/runtime/frame/data/lib/MatrixBlockFromFrame.java 94.11% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2480      +/-   ##
============================================
+ Coverage     71.37%   71.40%   +0.02%     
- Complexity    48749    48769      +20     
============================================
  Files          1571     1571              
  Lines        188912   188942      +30     
  Branches      37067    37071       +4     
============================================
+ Hits         134845   134905      +60     
+ Misses        43601    43578      -23     
+ Partials      10466    10459       -7     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Frame-to-matrix conversion silently fell back to writing NaN for cells
that cannot be cast to double, logging only a single warning. This
changed behavior for callers that previously failed fast on incompatible
data. Make the lenient behavior opt-in.

- Add sysds.frame.tomatrix.warncast (default false): when false the
  conversion fails fast on number format errors; when true it warns once
  and writes NaN for the incompatible cells
- Read the flag once on the calling thread and pass it down, since the
  thread-local config is not visible to pool workers
- Extract convertStrict to share the contiguous/generic dispatch between
  the strict and warn-only paths
- Add tests for the warn-only fallback, the strict fail-fast default, and
  the tightened DoubleParser trailing-character guard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant