Skip to content

Use frozenset for OMML direct tag lookup#1919

Open
Amitanand983 wants to merge 1 commit into
microsoft:mainfrom
Amitanand983:refactor/omml-frozenset-lookup
Open

Use frozenset for OMML direct tag lookup#1919
Amitanand983 wants to merge 1 commit into
microsoft:mainfrom
Amitanand983:refactor/omml-frozenset-lookup

Conversation

@Amitanand983
Copy link
Copy Markdown

Summary

Replaces __direct_tags from tuple to frozenset in the DOCX OMML math converter (oMath2Latex) to improve membership lookup efficiency in a hot code path.

Motivation

process_unknow() checks if stag in self.__direct_tags while recursively traversing OMML nodes.
For math-heavy documents, this check executes frequently.

  • tuple membership is linear scan: O(n)
  • frozenset membership is hash-based on average: O(1)

frozenset is immutable like tuple, and better represents an unordered constant set used only for membership checks.

Change

  • packages/markitdown/src/markitdown/converter_utils/docx/math/omml.py
    • __direct_tags changed from tuple to frozenset
  • No API or behavior change

Validation

  • pre-commit run --all-files: passed (black)
  • hatch test: 331 passed, 4 skipped, 1 pre-existing unrelated failure (test_speech_transcription)

Risk

  • Single-file, single-concept change
  • No change to converter output semantics

Replace __direct_tags tuple with frozenset in oMath2Latex for O(1) membership checks in process_unknow(). No behavioral change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant