Skip to content

feat(csv): make parseLine the synchronous primitive (refs #3765)#7118

Open
MukundaKatta wants to merge 2 commits into
denoland:mainfrom
MukundaKatta:feat/csv-parse-line-primitive
Open

feat(csv): make parseLine the synchronous primitive (refs #3765)#7118
MukundaKatta wants to merge 2 commits into
denoland:mainfrom
MukundaKatta:feat/csv-parse-line-primitive

Conversation

@MukundaKatta
Copy link
Copy Markdown

Summary

What changed

  • csv/_io.ts: new sync parseLine carries the whole field-parsing state machine (separator, quotes, escapes, lazyQuotes, comment, trim). The existing async parseRecord becomes a small wrapper that pulls more lines from the LineReader and re-calls parseLine until a record completes. Error column tracking maps absolute positions in the joined input back to (line, column) so multi-line quoted records still report the right line.
  • csv/parse.ts: drop Parser.#parseRecord's duplicate field loop; Parser now defers to parseLine from _io.ts. Add the public parseLine export with a clean (line, options) signature.
  • csv/parse_test.ts: 12 new tests pin parseLine behavior (happy path, custom separator, escaped quotes, BOM, trailing newline, multi-line quoted body, lazyQuotes, comment, unclosed-field error).

Test plan

  • All 133 existing parse + parse_stream steps still pass (145 total with the new parseLine tests).
  • Existing error-message assertions (StartLine1, StartLine2, ParseErrorLine, OddQuotes, etc.) preserved with no changes to test expectations.
  • Reviewer to confirm parseLine's public surface matches suggestion: investigate simpler CSV-parsing APIs #3765's spirit and that the (line, options) shape is what was wanted.

cc @bartlomieju — this replaces #7114 with the design you sketched in the review there.

Refactor the CSV parser so a single synchronous parseLine handles all
field-level rules, with parse() (sync) and CsvParseStream (async)
becoming thin line-iteration shells on top of it.

- _io.ts: introduce sync parseLine; rewrite the existing async
  parseRecord as a thin reader.readLine accumulator that delegates to
  parseLine. Error column tracking now resolves through embedded
  newlines so error messages stay correct for multi-line quoted records.
- parse.ts: drop the duplicate field-parsing loop that lived inside
  Parser.#parseRecord; both Parser and the new public parseLine share
  the same primitive. Public parseLine has the simple
  (line, options) -> string[] signature requested in denoland#3765, including
  BOM strip and trailing CR/LF/CRLF normalization.
- parse_test.ts: add 12 parseLine-specific tests covering happy path,
  custom separator, escapes, BOM, trailing newlines, multi-line quoted
  body, lazyQuotes, comment lines, and unclosed-field error.

All 133 existing parse + parse_stream tests still pass; new tests bring
the total to 145.
@github-actions github-actions Bot added the csv label Apr 28, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 91.96429% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.61%. Comparing base (cd03740) to head (ccb627d).

Files with missing lines Patch % Lines
csv/parse.ts 82.60% 6 Missing and 2 partials ⚠️
csv/_io.ts 98.48% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7118   +/-   ##
=======================================
  Coverage   94.61%   94.61%           
=======================================
  Files         634      634           
  Lines       51799    51769   -30     
  Branches     9329     9327    -2     
=======================================
- Hits        49009    48982   -27     
+ Misses       2216     2211    -5     
- Partials      574      576    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The deno_lint no-unused-vars check flagged the parameter on parseLine
(and the matching one on parseRecord and Parser.#parseRecord) — it was
threaded through but never read inside the function bodies because the
locate() helper computes line offsets from embedded newlines in the
joined fullLine instead.

Removing the param simplifies the call sites without changing behavior:
all 145 parse + parse_stream + parseLine tests still pass.
Copy link
Copy Markdown

@fibibot fibibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@std/csv is v1.0.6 and csv/mod.ts:184 re-exports everything from ./parse.ts, so the new parseLine lands on the stable surface. Per .github/CONTRIBUTING.md, new public APIs in v1.0.0+ packages need to live in csv/unstable_parse_line.ts, must not be re-exported from mod.ts, and must carry @experimental **UNSTABLE**: New API, yet to be vetted.. Title would shift to feat(csv/unstable): add parseLine.

The underlying refactor — unifying Parser.#parseRecord and the streaming parseRecord onto one field-state machine in _io.ts — is fine; the locate() helper correctly recomputes (line, col) for embedded \n and the EOF branches match the old line.length === 0 / line.length > 0 split.

  • nit: parse.ts:14 was previously a \uXXXX-escaped BOM constant; this PR replaces it with the raw U+FEFF character, which renders invisibly in most editors and breaks grep. The new BOM test in parse_test.ts does the same. Please keep the escape form — parse_test.ts already defines a BYTE_ORDER_MARK constant using it at the top of the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants