Skip to content

perf(tags): batch_mode + per-batch bulk inheritance during import + reorganize#14877

Draft
valentijnscholten wants to merge 14 commits into
DefectDojo:devfrom
valentijnscholten:perf/tag-inheritance-batch-during-import
Draft

perf(tags): batch_mode + per-batch bulk inheritance during import + reorganize#14877
valentijnscholten wants to merge 14 commits into
DefectDojo:devfrom
valentijnscholten:perf/tag-inheritance-batch-during-import

Conversation

@valentijnscholten
Copy link
Copy Markdown
Member

@valentijnscholten valentijnscholten commented May 14, 2026

Summary

Wraps the import / reimport hot loop in tag_inheritance.suppress_tag_inheritance() and bulk-applies inherited Product tags per batch, before post_process_findings_batch dispatches, so rules / deduplication see inherited tags on finding.tags. Pairs with the per-batch import-tag work landed in #14839.

This is the Phase B Stage 2 follow-up to #14811 / #14812 / #14827.

Changes

  • process_findings in DefaultImporter and DefaultReImporter now runs its finding-creation loop inside tag_inheritance.suppress_tag_inheritance(). Per batch — after apply_import_tags_for_batch and before dojo_dispatch_task(post_process_findings_batch, ...) — calls a new helper apply_inherited_tags_for_findings(batch_findings) that bulk-syncs inherited tags on the batch's Findings + the Endpoints (V2) / Locations (V3) reachable from them via FK chain.
  • The signal-driven auto-inheritance entrypoint is renamed inherit_instance_tagsauto_inherit_product_tags and moved from dojo.tags.inheritance to dojo.tags.signals, colocated with the receivers that call it. It is the only inheritance helper that uses is_suppressed() as a gate (others use suppress_tag_inheritance() to wrap their writes), so the gate semantic is exclusive to the signal path.
  • The post_save handlers for child models (Endpoint/Engagement/Test/Finding/Location) and ref models (LocationFindingReference/LocationProductReference) are combined into a single inherit_tags_on_instance handler with sender-based target dispatch + created=True gate. Previously the ref handler fired on every save — including set_status updates that don't change the Location's related-product set; those now correctly no-op.
  • EndpointManager.get_or_create_endpoints replaces its per-row inherit_instance_tags(ep) cleanup loop with a single apply_inherited_tags_for_endpoints(created) bulk call. Inside the importer the per-batch helper already covers these endpoints via Endpoint.status_endpoint.finding; the bulk call is retained as a defensive hook for non-importer callers.
  • New apply_inherited_tags_for_locations(locations, *, product) mirrors the endpoint / finding helpers. LocationManager._bulk_inherit_tags is removed; the persist path delegates to the new helper. Drops a redundant outer suppress_tag_inheritance() (writes inside _sync_inherited_tags are already wrapped) and a redundant bulk pre-fetch of existing inherited tags (the through-table primitive does that read once already).
  • _inherited_tag_names_for_location (the per-Location callback for _sync_inheritance_for_qs) now filters contributing products by their own enable_product_tag_inheritance flag. Previously, tags from flag-off products linked to the same Location could leak in via propagate_tags_on_product_sync and apply_inherited_tags_for_findings.
  • propagate_tags_on_product_sync (used by the product-tag-toggle Celery task) gains an early-exit when neither system-wide nor per-product inheritance is enabled, eliminating ~9 wasted reads per call on inheritance-off products. State transitions (toggling either flag) still trigger a full sweep via their existing signal handlers.
  • Location gains iter_related_products() — a related-manager (self.products + self.findings) implementation returning list[Product]. Callers that pre-issue prefetch_related("products__product__tags", "findings__finding__test__engagement__product__tags") get zero extra queries per Location. The existing JOIN'd all_related_products() is kept for the per-instance signal path where prefetching is not possible.
  • Change the "managing" of inherited tags from a "rebuild every time" to a "diff" based approach where added inherited tags on products are cascaded into the object's tags field, and removed inherited tags on products are also removed from the object's tag. No need anymore to reconstruct user_tags or do complex passing of potential tags. Behavior is still the same, guarded by pre-existing tests.
  • Centralize tag (inheritance) logic into dojo.tags.inheritance. Move the propagate_tags_on_product Celery task into that module alongside its _sync counterpart and delete dojo/product/helpers.py. A propagate_tags_on_product_deprecated alias is kept under the old task name so in-flight tasks complete after upgrade.
  • Rename _sync_inheritance_for_qs kwarg target_names_per_childtarget_tag_names_per_child for clarity.

Query-count impact

Pins in unittests/test_tag_inheritance_perf.py updated to match. "Before" numbers are the pre-Phase-A baselines recorded in source comments next to each constant.

Hot path Before After Δ
ZAP scan import V2 (19 findings) 1378 422 -956
ZAP scan import V3 1256 445 -811
ZAP reimport no-change V2 69 75 +6
ZAP reimport no-change V3 87 101 +14
Product tag add → 100 findings (V2) 4758 91 -4667
Product tag add → 100 findings (V3) 4759 91 -4668
Product tag remove → 100 findings (V2) 4540 53 -4487
Product tag remove → 100 findings (V3) 4541 53 -4488
Product tag add → 100 endpoints (V2) 3958 91 -3867
Product tag remove → 100 endpoints (V2) 3740 53 -3687
Product tag add → 100 locations (V3) 4532 125 -4407
Product tag remove → 100 locations (V3) 4307 75 -4232

…ase B Stage 2)

Wraps the import / reimport hot loop in `tag_inheritance.batch_mode()` and
bulk-applies inherited Product tags per batch *before* `post_process_findings_batch`
dispatches, so rules / deduplication see inherited tags on `finding.tags`.

Changes:

- `process_findings` (DefaultImporter + DefaultReImporter) now runs its
  finding-creation loop inside `batch_mode()`. Per batch, right after
  `apply_import_tags_for_batch`, calls a new helper
  `apply_inherited_tags_for_findings(batch_findings)` that bulk-syncs
  inherited tags on the batch's Findings plus the Endpoints (V2) / Locations
  (V3) reachable from them via FK chain. Inheritance is therefore applied to
  the persisted children before the post-process task dispatches.
- `inherit_instance_tags` in `dojo/tags_signals.py` now early-returns when
  `tag_inheritance.is_in_batch_mode()`, so the batch wrap transparently
  suppresses per-row inheritance work for any caller — including
  `bulk_create` cleanup paths that invoke it manually. `inherit_tags_on_instance`
  post_save delegates to that helper, so the gate also covers signal-driven
  fires.
- `EndpointManager.get_or_create_endpoints` replaces its per-row
  `inherit_instance_tags(ep)` cleanup loop with a single
  `apply_inherited_tags_for_endpoints(created)` bulk call. Inside the importer
  the per-batch helper already covers these endpoints via
  `Endpoint.status_endpoint.finding`; the bulk call is kept as a defensive
  hook for any non-importer caller.
- `propagate_tags_on_product_sync` (used by the product-tag-toggle Celery
  task) gains an early-exit when neither system-wide nor per-product
  inheritance is enabled, eliminating ~9 wasted reads per call on
  inheritance-off products. State transitions (toggling either flag) still
  trigger a full sweep through their existing signal handlers.
- `Location` gains `iter_related_products()`: a related-manager
  (`self.products` + `self.findings`) implementation of `all_related_products()`
  that returns `list[Product]`. Callers that pre-issue
  `prefetch_related("products__product__tags",
  "findings__finding__test__engagement__product__tags")` get zero extra
  queries per Location. The existing JOIN'd `all_related_products()` is kept
  for the per-instance signal path where prefetching is not possible.
- `_inherited_tag_names_for_location` (the per-Location callback used to
  compute the inherited target set) switches to `iter_related_products()`;
  both call sites (`propagate_tags_on_product_sync` V3 branch and
  `apply_inherited_tags_for_findings` V3 branch) now prefetch the chain.

Query-count impact on `unittests/test_tag_inheritance_perf.py` (pins updated
in this commit):

| Hot path                                | Before  | After  | Δ     |
|-----------------------------------------|--------:|-------:|------:|
| ZAP scan import V2 (19 findings)        |    1385 |    477 |  -908 |
| ZAP scan import V3                      |    1263 |    945 |  -318 |
| ZAP reimport no-change V2               |      69 |     75 |    +6 |
| ZAP reimport no-change V3               |      87 |    102 |   +15 |
| Product tag add → 100 locations (V3)    |     316 |    125 |  -191 |
| Product tag remove → 100 locations (V3) |     266 |     75 |  -191 |

Small reimport-no-change regressions are the unavoidable per-batch helper
read cost (2 reads × Finding + 2 reads × Endpoint/Location + 1 product tags
read). Real-work imports drop significantly because per-row
`_manage_inherited_tags` work no longer fires inside the loop.
@valentijnscholten valentijnscholten force-pushed the perf/tag-inheritance-batch-during-import branch from ecd29bb to 8513855 Compare May 14, 2026 23:43
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

@valentijnscholten valentijnscholten changed the title perf(tags): batch_mode + per-batch bulk inheritance during import (Phase B Stage 2) perf(tags): batch_mode + per-batch bulk inheritance during import + reorganize May 15, 2026
Move scattered tag-inheritance logic into dojo/tag_inheritance.py without
changing runtime behavior:

- _manage_inherited_tags relocated from dojo/models.py (replaced by a
  lazy-import shim to avoid an import cycle through dojo.utils).
- get_products, inherit_product_tags, get_products_to_inherit_tags_from,
  propagate_inheritance, inherit_instance_tags, inherit_linked_instance_tags
  relocated from dojo/tags_signals.py; receivers there now delegate.
- propagate_tags_on_product_sync, apply_inherited_tags_for_endpoints,
  apply_inherited_tags_for_findings, _sync_inheritance_for_qs and
  _inherited_tag_names_for_location relocated from dojo/product/helpers.py;
  the Celery wrapper propagate_tags_on_product stays in product/helpers.py.

Backward-compat re-exports preserved at every original import site so
external callers (dojo/location, dojo/importers, unittests, pro/) keep
working unchanged.

Adjusts unittests.test_tag_inheritance mock targets to the new module
path. Bumps EXPECTED_ZAP_IMPORT_V2/V3 baselines (477->470, 945->938) to
the actual query counts; the prior pin was already drifting on this
branch before the move.
Move the three tag-related top-level modules into a dedicated package:

  dojo/tag_inheritance.py -> dojo/tags/inheritance.py
  dojo/tag_utils.py       -> dojo/tags/utils.py
  dojo/tags_signals.py    -> dojo/tags/signals.py

Update all internal call sites to the new canonical paths (apps,
models, importers, finding, product/helpers, utils_cascade_delete,
settings, unittests). No backward-compat shims left at the old paths.

Also drops the unused dojo/tag/ stub directory that only contained
stale .pyc files.

No logic change. All tag inheritance and tag bulk unit tests pass.
@github-actions github-actions Bot added the settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR label May 15, 2026
Tag inheritance is usually enabled system-wide first, then optionally
overridden per product. The system setting is cached, so check it first
and short-circuit the related-product walk / per-product attribute read
when it's already True.

Sites updated:
- dojo/tags/inheritance.py: inherit_product_tags,
  get_products_to_inherit_tags_from, and the three early-return guards
  inside propagate_tags_on_product_sync / apply_inherited_tags_for_*.
- dojo/location/models.py: Location.products_to_inherit_tags_from.
- dojo/importers/location_manager.py: bulk-inherit early-return.

Also picks up the in-flight rename of batch_mode() -> suppress() /
is_in_batch_mode() -> is_suppressed() in the two importers.

Behavior unchanged. All tag inheritance / perf / bulk unit tests pass
(36 + 20 + 29).
…ance wrapper

- is_tag_inheritance_enabled now delegates to get_products_to_inherit_tags_from
  (no products eligible -> not enabled). Centralizes the "which products
  contribute inherited tags" decision in one place; the None-product filter
  moves into get_products_to_inherit_tags_from so both call sites get it.
- Drop the inherit_linked_instance_tags wrapper: single signal caller now
  inlines tag_inheritance.inherit_instance_tags(instance.location).
- Trim dead backward-compat re-exports from dojo/tags/signals.py; reroute
  the test imports to dojo.tags.inheritance directly.

Behavior unchanged. test_tag_inheritance (36) and test_tag_inheritance_perf
(20) pass.
Combine the two-step gate (propagate_inheritance returning bool, then
caller writing) into a single inherit_instance_tags() with an optional
force=False kwarg that bypasses the suppress() check. The m2m
make_inherited_tags_sticky receiver now also routes through this single
entry point.

Wrap the inner instance.inherit_tags(tag_list) call in
suppress_tag_inheritance() to short-circuit m2m_changed re-entry. The
re-entrant signal previously did a redundant in-sync check using a
stale get_tag_list() cached value; suppressing it both eliminates the
recursion risk and removes per-row redundant queries:

  - test_baseline_zap_scan_import_v2: 470 -> 463
  - test_baseline_zap_scan_import_v3: 938 -> 931
  - test_create_one_finding_v2/v3: 64 -> 61
  - test_create_100_findings_v2/v3: 4024 -> 3724
  - test_finding_add_user_tag_v2/v3: 17 -> 16
  - test_finding_remove_inherited_v2/v3: 44 -> 40

Also picks up the rename of the context manager
suppress() -> suppress_tag_inheritance(). Unit tests for the early-exit
optimization rewritten against the merged function (still 4 cases).
Rewrite the per-row inheritance primitive as a pure diff:

  current_inherited = obj.inherited_tags.all()
  target            = incoming_inherited_tags
  remove (current - target), add (target - current)
  + sticky re-add any target name missing from obj.tags

Drops the "rebuild full tag_list, set() everything" approach and removes
the `existing_tags_hint` parameter entirely. Writes are wrapped in
`suppress_tag_inheritance()` so the m2m_changed signal can't dispatch
make_inherited_tags_sticky back into this function.

Other cleanup:

- Rename `_manage_inherited_tags` -> `_sync_inherited_tags`.
- Drop per-model `inherit_tags(self, existing_tags_hint)` methods on
  Endpoint / Engagement / Test / Finding / Location. The signal path
  (`inherit_instance_tags`) and `LocationManager._bulk_inherit_tags` now
  call `_sync_inherited_tags(instance, incoming)` directly.
- Drop the unused `existing_tags_by_location` dict in
  `LocationManager._bulk_inherit_tags` (one fewer through-table read).
- Unit tests rewritten against the diff primitive (4 cases) plus a
  no-products early-exit test for `inherit_instance_tags`.

Perf baselines (test_tag_inheritance_perf):

  EXPECTED_ZAP_IMPORT_V2:            463 -> 422
  EXPECTED_ZAP_IMPORT_V3:            931 -> 809
  EXPECTED_ZAP_REIMPORT_NO_CHANGE_V3:102 -> 101
  EXPECTED_CREATE_ONE_FINDING_V2/V3:  61 -> 55
  EXPECTED_CREATE_100_FINDINGS_V2/V3: 3724 -> 3124  (-600)
  EXPECTED_FINDING_REMOVE_INHERITED_V2/V3: 40 -> 18
  EXPECTED_FINDING_ADD_USER_TAG_V2/V3:     16 -> 17  (+1)
…elpers.py

- New `apply_inherited_tags_for_locations(locations, *, product)` mirrors the
  endpoint/finding helpers. Replaces `LocationManager._bulk_inherit_tags`;
  callsite delegates. Drops the redundant outer `suppress_tag_inheritance()`
  (writes inside `_sync_inherited_tags` are already wrapped) and the bulk
  pre-fetch of existing inherited tags (the through-table primitive does
  this read once already).
- `_inherited_tag_names_for_location` now filters contributing products by
  their own `enable_product_tag_inheritance` flag. Previously tags from
  flag-off products linked to the same Location could leak in via
  `propagate_tags_on_product_sync` / `apply_inherited_tags_for_findings`.
- Move `propagate_tags_on_product` Celery task into `dojo.tags.inheritance`
  alongside its `_sync` counterpart; delete `dojo/product/helpers.py`. Keep
  a `propagate_tags_on_product_deprecated` alias under the old task name so
  in-flight tasks complete after upgrade.
- Rename `_sync_inheritance_for_qs` kwarg `target_names_per_child` ->
  `target_tag_names_per_child` for clarity.
- Update import sites + perf baseline V3 import: 809 -> 445 queries.
…tags, move to signals

Renames `inherit_instance_tags` to `auto_inherit_product_tags` and relocates
it from `dojo.tags.inheritance` to `dojo.tags.signals`. The function is the
signal-driven entrypoint that applies a Product's tags to a child instance;
moving it next to the signal receivers (and renaming for clarity) makes the
auto-inheritance flow self-contained.

Of all the inheritance helpers it is the only one that early-returns when
`suppress_tag_inheritance()` is active — the other helpers use the context
manager around their writes for reentrancy. That gate semantic is exclusive
to the signal path, so colocation fits.

Also combines `inherit_tags_on_linked_instance` into `inherit_tags_on_instance`
with sender-based target dispatch and a `created=True` gate. Previously the
ref handler fired on every save, including `set_status` updates that don't
change the Location's related-product set; those now correctly no-op.

Drops the unused `force=` parameter on the function while renaming.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant