feat(source-hubspot): capture CRM deletions as in-stream tombstones#2
Open
vklimontovich wants to merge 1 commit into
Open
feat(source-hubspot): capture CRM deletions as in-stream tombstones#2vklimontovich wants to merge 1 commit into
vklimontovich wants to merge 1 commit into
Conversation
The POST /crm/v3/objects/<entity>/search endpoint never returns archived records, so deletions are invisible on incremental syncs. Add a HubspotDeletionRetriever that runs the normal live search pass plus a second GET /crm/v3/objects/<entity>?archived=true pass, emitting archived objects into the same stream as tombstones carrying the same PK id, full properties_* columns, _ab_cdc_deleted_at = archivedAt (fallback createdAt), and updatedAt = archivedAt. Wired for companies, contacts, deals, leads, engagements_calls, engagements_emails, engagements_notes, engagements_tasks. Refs airbytehq#47198, airbytehq#40595
1f19301 to
f6a5664
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds deletion capture to
source-hubspot. HubSpot CRM streams are read incrementally throughPOST /crm/v3/objects/{entity}/search, but that endpoint never returns archived (deleted) records —so once an object is deleted in HubSpot it silently stops updating and the stale row lives forever;
deletions are invisible on Incremental Append. (See airbytehq#47198 and airbytehq#40595.)
This adds a
HubspotDeletionRetrievercustom component. For each opted-in CRM search stream it runs thenormal live search pass plus a second pass against
GET /crm/v3/objects/{entity}?archived=true, andemits each archived object back into the same stream as a CDC tombstone:
idproperties_*set (so a destination merge does not NULL-overwrite columns)_ab_cdc_deleted_at = archivedAt(fallbackcreatedAt)updatedAt = archivedAt(so the tombstone sorts after the live row for PK dedup)Emitted exactly once per sync (lock-guarded — the concurrent path shares one retriever across
time-slices) and best-effort: if the archived endpoint is unavailable for a scope it logs a warning and
continues, never failing the live sync.
Covered streams
companies,contacts,deals,leads,engagements_calls,engagements_emails,engagements_notes,engagements_tasks. The existingdeals_archivedstream is left unchanged.Deferred (and why)
tickets,engagements_meetings,deal_splits,goals— HubSpot rejects archived paging for these("Paging through deleted objects is not yet supported" / 403).
products,line_items— follow-up.Design note for maintainers
In-stream tombstones need no new streams or catalog changes and work with existing incremental+dedup
destinations. The repo already has a precedent for the separate
*_archivedstream approach (e.g.deals_archived), which is what airbytehq#40595 literally asks for. If you'd prefer that direction instead of/inaddition to CDC tombstones, happy to adjust — flagging before it lands.
Testing
unit_tests/test_hubspot_deletion_retriever.py: an archived record becomes a tombstone with_ab_cdc_deleted_at == updatedAt == archivedAt, the sameid, and the full flattenedproperties_*.specemits; all 8 wired streams resolveHubspotDeletionRetriever.no duplicates, full properties.
Version
Bumps
dockerImageTag6.7.0→6.8.0and adds a changelog entry.