Skip to content

feat(source-hubspot): capture CRM deletions as in-stream tombstones#2

Open
vklimontovich wants to merge 1 commit into
hubspot-deletions-basefrom
source-hubspot/capture-crm-deletions
Open

feat(source-hubspot): capture CRM deletions as in-stream tombstones#2
vklimontovich wants to merge 1 commit into
hubspot-deletions-basefrom
source-hubspot/capture-crm-deletions

Conversation

@vklimontovich

@vklimontovich vklimontovich commented Jul 1, 2026

Copy link
Copy Markdown

What

Adds deletion capture to source-hubspot. HubSpot CRM streams are read incrementally through
POST /crm/v3/objects/{entity}/search, but that endpoint never returns archived (deleted) records
so once an object is deleted in HubSpot it silently stops updating and the stale row lives forever;
deletions are invisible on Incremental Append. (See airbytehq#47198 and airbytehq#40595.)

This adds a HubspotDeletionRetriever custom component. For each opted-in CRM search stream it runs the
normal live search pass plus a second pass against GET /crm/v3/objects/{entity}?archived=true, and
emits each archived object back into the same stream as a CDC tombstone:

  • same primary key id
  • the full flattened properties_* set (so a destination merge does not NULL-overwrite columns)
  • _ab_cdc_deleted_at = archivedAt (fallback createdAt)
  • updatedAt = archivedAt (so the tombstone sorts after the live row for PK dedup)

Emitted exactly once per sync (lock-guarded — the concurrent path shares one retriever across
time-slices) and best-effort: if the archived endpoint is unavailable for a scope it logs a warning and
continues, never failing the live sync.

Covered streams

companies, contacts, deals, leads, engagements_calls, engagements_emails,
engagements_notes, engagements_tasks. The existing deals_archived stream is left unchanged.

Deferred (and why)

tickets, engagements_meetings, deal_splits, goals — HubSpot rejects archived paging for these
("Paging through deleted objects is not yet supported" / 403). products, line_items — follow-up.

Design note for maintainers

In-stream tombstones need no new streams or catalog changes and work with existing incremental+dedup
destinations. The repo already has a precedent for the separate *_archived stream approach (e.g.
deals_archived), which is what airbytehq#40595 literally asks for. If you'd prefer that direction instead of/in
addition to CDC tombstones, happy to adjust — flagging before it lands.

Testing

  • New unit_tests/test_hubspot_deletion_retriever.py: an archived record becomes a tombstone with
    _ab_cdc_deleted_at == updatedAt == archivedAt, the same id, and the full flattened properties_*.
  • Manifest parses and spec emits; all 8 wired streams resolve HubspotDeletionRetriever.
  • Validated against a live HubSpot account: companies/deals/contacts tombstones with correct markers,
    no duplicates, full properties.

Version

Bumps dockerImageTag 6.7.06.8.0 and adds a changelog entry.

The POST /crm/v3/objects/<entity>/search endpoint never returns archived
records, so deletions are invisible on incremental syncs. Add a
HubspotDeletionRetriever that runs the normal live search pass plus a second
GET /crm/v3/objects/<entity>?archived=true pass, emitting archived objects
into the same stream as tombstones carrying the same PK id, full properties_*
columns, _ab_cdc_deleted_at = archivedAt (fallback createdAt), and
updatedAt = archivedAt.

Wired for companies, contacts, deals, leads, engagements_calls,
engagements_emails, engagements_notes, engagements_tasks.

Refs airbytehq#47198, airbytehq#40595
@vklimontovich vklimontovich force-pushed the source-hubspot/capture-crm-deletions branch from 1f19301 to f6a5664 Compare July 1, 2026 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant