Skip to content

fix(trigger): fix Google Sheets trigger header detection and row index tracking#4109

Merged
waleedlatif1 merged 6 commits intostagingfrom
worktree-fix+google-sheets-trigger-header-rowindex
Apr 11, 2026
Merged

fix(trigger): fix Google Sheets trigger header detection and row index tracking#4109
waleedlatif1 merged 6 commits intostagingfrom
worktree-fix+google-sheets-trigger-header-rowindex

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Auto-detect header row by scanning the first 10 rows for the first non-empty row, instead of hardcoding row 1 — fixes `row: null, headers: []` for sheets where headers aren't in row 1
  • Replace `lastKnownRowCount` with `lastIndexChecked` (1-indexed row pointer) for accurate new-row detection
  • Skip header row from data emission using `adjustedStartRow` so it never fires a workflow run
  • Use full A:Z fetch for sheet state so row count is accurate regardless of which columns have data
  • Preserve `lastModifiedTime` when remaining rows exist beyond the batch cap, preventing the Drive pre-check from skipping them on the next cycle
  • Skip empty rows in `processRows` to avoid firing workflow runs with no data

Type of Change

  • Bug fix

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…lastIndexChecked

- Replace hardcoded !1:1 header fetch with detectHeaderRow(), which scans
  the first 10 rows and returns the first non-empty row as headers. This
  fixes row: null / headers: [] when a sheet has blank rows or a title row
  above the actual column headers (e.g. headers in row 3).
- Rename lastKnownRowCount → lastIndexChecked in GoogleSheetsWebhookConfig
  and all usage sites to clarify that the value is a row index pointer, not
  a total count.
- Remove config parameter from processRows() since it was unused after the
  includeHeaders flag was removed.
…data emission

- Replace separate getDataRowCount() + detectHeaderRow() with a single
  fetchSheetState() call that returns rowCount, headers, and headerRowIndex
  from one A:Z fetch. Saves one Sheets API round-trip per poll cycle when
  new rows are detected.
- Use headerRowIndex to compute adjustedStartRow, preventing the header row
  (and any blank rows above it) from being emitted as data events when
  lastIndexChecked was seeded from an empty sheet.
- Handle the edge case where the entire batch falls within the header/blank
  window by advancing the pointer and returning early without fetching rows.
- Skip empty rows (row.length === 0) in processRows rather than firing a
  workflow run with no meaningful data.
…er header skip

When all rows in a batch fall within the header/blank window (adjustedStartRow
> endRow), the early return was unconditionally updating lastModifiedTime to the
current value. If there were additional rows beyond the batch cap, the next
Drive pre-check would see an unchanged modifiedTime and skip polling entirely,
leaving those rows unprocessed. Mirror the hasRemainingOrFailed pattern from the
normal processing path.
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 11, 2026 6:30pm

Request Review

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 11, 2026

PR Summary

Medium Risk
Changes polling state semantics (lastKnownRowCount -> lastIndexChecked) and header/row filtering, which could cause one-time missed or duplicate events for existing webhooks if stored provider config isn’t migrated or edge cases exist around blank/header rows.

Overview
Fixes Google Sheets polling to auto-detect headers (scan first 10 rows for the first non-empty row) and fetch sheet state from A:Z so row counting doesn’t miss data in non-A columns.

Replaces lastKnownRowCount with a 1-indexed lastIndexChecked pointer, ensures the detected header row is never emitted as a data event, skips empty rows during processing, and preserves lastModifiedTime when a poll hits the per-batch cap so remaining rows aren’t skipped next cycle.

Reviewed by Cursor Bugbot for commit 9b3af87. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 11, 2026

Greptile Summary

This PR fixes several correctness issues in the Google Sheets polling trigger: auto-detecting the header row (scanning up to 10 rows instead of hardcoding row 1), replacing the old lastKnownRowCount counter with a 1-indexed lastIndexChecked pointer, skipping the header and pre-header blank rows from data emission via adjustedStartRow, preserving lastModifiedTime so the Drive pre-check doesn't skip remaining batches, and removing the spurious processedCount increment for skipped empty rows. All three concerns raised in the previous review threads have been addressed.

Confidence Score: 5/5

Safe to merge; all previously raised concerns are resolved and the updated logic handles edge cases correctly.

Thorough review of the single changed file found no P0 or P1 issues. The adjustedStartRow offset, pointer advancement by rowsToFetch, empty-row skip, lastModifiedTime preservation, and first-poll seeding all behave correctly across the edge cases examined (empty leading rows, header mid-sheet, partial-failure retry, batch cap with remaining rows). All three threads from the previous review cycle are resolved.

No files require special attention.

Important Files Changed

Filename Overview
apps/sim/lib/webhooks/polling/google-sheets.ts Rewrites header detection to auto-scan first 10 rows, replaces lastKnownRowCount with lastIndexChecked, skips header row from emission via adjustedStartRow, preserves lastModifiedTime for remaining batches, and drops empty-row processedCount increment; logic is correct across the edge cases reviewed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([pollWebhook]) --> B[resolveOAuthCredential]
    B --> C[isDriveFileUnchanged]
    C -->|unchanged| D[updateConfig lastCheckedTimestamp\nmarkWebhookSuccess]
    C -->|changed| E[fetchSheetState A:Z\nrowCount · headers · headerRowIndex]
    E --> F{lastIndexChecked\nundefined?}
    F -->|yes — first poll| G[seed lastIndexChecked = rowCount\nmarkWebhookSuccess]
    F -->|no| H{currentRowCount\n<= lastIndexChecked?}
    H -->|yes — no new rows| I[update lastIndexChecked = currentRowCount\nmarkWebhookSuccess]
    H -->|no — new rows found| J[compute startRow · endRow · rowsToFetch\nadjustedStartRow = max startRow headerRowIndex+1]
    J --> K{adjustedStartRow\n> endRow?}
    K -->|yes — batch is all header/blank| L[advance pointer by rowsToFetch\npreserve lastModifiedTime if remaining\nmarkWebhookSuccess]
    K -->|no| M[fetchRowRange adjustedStartRow→endRow]
    M --> N[processRows\nskip empty rows\nfire workflow per row via idempotency]
    N --> O{failedCount > 0?}
    O -->|all failed| P[markWebhookFailed\ndo NOT advance pointer\nreturn failure]
    O -->|partial or none| Q[advance pointer by rowsToFetch\npreserve lastModifiedTime if remaining or failed\nmarkWebhookSuccess]
Loading

Reviews (3): Last reviewed commit: "fix(trigger): don't count skipped empty ..." | Re-trigger Greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 9b3af87. Configure here.

@waleedlatif1 waleedlatif1 merged commit 74d0a47 into staging Apr 11, 2026
12 checks passed
@waleedlatif1 waleedlatif1 deleted the worktree-fix+google-sheets-trigger-header-rowindex branch April 11, 2026 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant