feat(csv): make parseLine the synchronous primitive (refs #3765)#7118
feat(csv): make parseLine the synchronous primitive (refs #3765)#7118MukundaKatta wants to merge 2 commits into
Conversation
Refactor the CSV parser so a single synchronous parseLine handles all field-level rules, with parse() (sync) and CsvParseStream (async) becoming thin line-iteration shells on top of it. - _io.ts: introduce sync parseLine; rewrite the existing async parseRecord as a thin reader.readLine accumulator that delegates to parseLine. Error column tracking now resolves through embedded newlines so error messages stay correct for multi-line quoted records. - parse.ts: drop the duplicate field-parsing loop that lived inside Parser.#parseRecord; both Parser and the new public parseLine share the same primitive. Public parseLine has the simple (line, options) -> string[] signature requested in denoland#3765, including BOM strip and trailing CR/LF/CRLF normalization. - parse_test.ts: add 12 parseLine-specific tests covering happy path, custom separator, escapes, BOM, trailing newlines, multi-line quoted body, lazyQuotes, comment lines, and unclosed-field error. All 133 existing parse + parse_stream tests still pass; new tests bring the total to 145.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7118 +/- ##
=======================================
Coverage 94.61% 94.61%
=======================================
Files 634 634
Lines 51799 51769 -30
Branches 9329 9327 -2
=======================================
- Hits 49009 48982 -27
+ Misses 2216 2211 -5
- Partials 574 576 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The deno_lint no-unused-vars check flagged the parameter on parseLine (and the matching one on parseRecord and Parser.#parseRecord) — it was threaded through but never read inside the function bodies because the locate() helper computes line offsets from embedded newlines in the joined fullLine instead. Removing the param simplifies the call sites without changing behavior: all 145 parse + parse_stream + parseLine tests still pass.
There was a problem hiding this comment.
@std/csv is v1.0.6 and csv/mod.ts:184 re-exports everything from ./parse.ts, so the new parseLine lands on the stable surface. Per .github/CONTRIBUTING.md, new public APIs in v1.0.0+ packages need to live in csv/unstable_parse_line.ts, must not be re-exported from mod.ts, and must carry @experimental **UNSTABLE**: New API, yet to be vetted.. Title would shift to feat(csv/unstable): add parseLine.
The underlying refactor — unifying Parser.#parseRecord and the streaming parseRecord onto one field-state machine in _io.ts — is fine; the locate() helper correctly recomputes (line, col) for embedded \n and the EOF branches match the old line.length === 0 / line.length > 0 split.
- nit:
parse.ts:14was previously a\uXXXX-escaped BOM constant; this PR replaces it with the raw U+FEFF character, which renders invisibly in most editors and breaks grep. The new BOM test inparse_test.tsdoes the same. Please keep the escape form —parse_test.tsalready defines aBYTE_ORDER_MARKconstant using it at the top of the file.
Summary
parseLinethe actual internal CSV primitive that bothparse()andCsvParseStreambuild on, addressing the design feedback from feat(csv): add parseLine() convenience for single-line CSV records (refs #3765) #7114 (closed) and aligning with suggestion: investigate simpler CSV-parsing APIs #3765's intent.Parser.#parseRecordinparse.ts— bothparse()(sync) and the streaming path now share one set of field/quote rules.parseLine(line, options) -> string[]is the simple shape suggestion: investigate simpler CSV-parsing APIs #3765 asked for, with BOM strip and trailing CR/LF/CRLF normalization.What changed
csv/_io.ts: new syncparseLinecarries the whole field-parsing state machine (separator, quotes, escapes, lazyQuotes, comment, trim). The existing asyncparseRecordbecomes a small wrapper that pulls more lines from theLineReaderand re-callsparseLineuntil a record completes. Error column tracking maps absolute positions in the joined input back to (line, column) so multi-line quoted records still report the right line.csv/parse.ts: dropParser.#parseRecord's duplicate field loop;Parsernow defers toparseLinefrom_io.ts. Add the publicparseLineexport with a clean(line, options)signature.csv/parse_test.ts: 12 new tests pin parseLine behavior (happy path, custom separator, escaped quotes, BOM, trailing newline, multi-line quoted body, lazyQuotes, comment, unclosed-field error).Test plan
parseLine's public surface matches suggestion: investigate simpler CSV-parsing APIs #3765's spirit and that the(line, options)shape is what was wanted.cc @bartlomieju — this replaces #7114 with the design you sketched in the review there.