perf(tags): batch_mode + per-batch bulk inheritance during import + reorganize#14877
Draft
valentijnscholten wants to merge 14 commits into
Draft
perf(tags): batch_mode + per-batch bulk inheritance during import + reorganize#14877valentijnscholten wants to merge 14 commits into
valentijnscholten wants to merge 14 commits into
Conversation
…ase B Stage 2)
Wraps the import / reimport hot loop in `tag_inheritance.batch_mode()` and
bulk-applies inherited Product tags per batch *before* `post_process_findings_batch`
dispatches, so rules / deduplication see inherited tags on `finding.tags`.
Changes:
- `process_findings` (DefaultImporter + DefaultReImporter) now runs its
finding-creation loop inside `batch_mode()`. Per batch, right after
`apply_import_tags_for_batch`, calls a new helper
`apply_inherited_tags_for_findings(batch_findings)` that bulk-syncs
inherited tags on the batch's Findings plus the Endpoints (V2) / Locations
(V3) reachable from them via FK chain. Inheritance is therefore applied to
the persisted children before the post-process task dispatches.
- `inherit_instance_tags` in `dojo/tags_signals.py` now early-returns when
`tag_inheritance.is_in_batch_mode()`, so the batch wrap transparently
suppresses per-row inheritance work for any caller — including
`bulk_create` cleanup paths that invoke it manually. `inherit_tags_on_instance`
post_save delegates to that helper, so the gate also covers signal-driven
fires.
- `EndpointManager.get_or_create_endpoints` replaces its per-row
`inherit_instance_tags(ep)` cleanup loop with a single
`apply_inherited_tags_for_endpoints(created)` bulk call. Inside the importer
the per-batch helper already covers these endpoints via
`Endpoint.status_endpoint.finding`; the bulk call is kept as a defensive
hook for any non-importer caller.
- `propagate_tags_on_product_sync` (used by the product-tag-toggle Celery
task) gains an early-exit when neither system-wide nor per-product
inheritance is enabled, eliminating ~9 wasted reads per call on
inheritance-off products. State transitions (toggling either flag) still
trigger a full sweep through their existing signal handlers.
- `Location` gains `iter_related_products()`: a related-manager
(`self.products` + `self.findings`) implementation of `all_related_products()`
that returns `list[Product]`. Callers that pre-issue
`prefetch_related("products__product__tags",
"findings__finding__test__engagement__product__tags")` get zero extra
queries per Location. The existing JOIN'd `all_related_products()` is kept
for the per-instance signal path where prefetching is not possible.
- `_inherited_tag_names_for_location` (the per-Location callback used to
compute the inherited target set) switches to `iter_related_products()`;
both call sites (`propagate_tags_on_product_sync` V3 branch and
`apply_inherited_tags_for_findings` V3 branch) now prefetch the chain.
Query-count impact on `unittests/test_tag_inheritance_perf.py` (pins updated
in this commit):
| Hot path | Before | After | Δ |
|-----------------------------------------|--------:|-------:|------:|
| ZAP scan import V2 (19 findings) | 1385 | 477 | -908 |
| ZAP scan import V3 | 1263 | 945 | -318 |
| ZAP reimport no-change V2 | 69 | 75 | +6 |
| ZAP reimport no-change V3 | 87 | 102 | +15 |
| Product tag add → 100 locations (V3) | 316 | 125 | -191 |
| Product tag remove → 100 locations (V3) | 266 | 75 | -191 |
Small reimport-no-change regressions are the unavoidable per-batch helper
read cost (2 reads × Finding + 2 reads × Endpoint/Location + 1 product tags
read). Real-work imports drop significantly because per-row
`_manage_inherited_tags` work no longer fires inside the loop.
ecd29bb to
8513855
Compare
Contributor
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
…-batch-during-import
Contributor
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
Move scattered tag-inheritance logic into dojo/tag_inheritance.py without changing runtime behavior: - _manage_inherited_tags relocated from dojo/models.py (replaced by a lazy-import shim to avoid an import cycle through dojo.utils). - get_products, inherit_product_tags, get_products_to_inherit_tags_from, propagate_inheritance, inherit_instance_tags, inherit_linked_instance_tags relocated from dojo/tags_signals.py; receivers there now delegate. - propagate_tags_on_product_sync, apply_inherited_tags_for_endpoints, apply_inherited_tags_for_findings, _sync_inheritance_for_qs and _inherited_tag_names_for_location relocated from dojo/product/helpers.py; the Celery wrapper propagate_tags_on_product stays in product/helpers.py. Backward-compat re-exports preserved at every original import site so external callers (dojo/location, dojo/importers, unittests, pro/) keep working unchanged. Adjusts unittests.test_tag_inheritance mock targets to the new module path. Bumps EXPECTED_ZAP_IMPORT_V2/V3 baselines (477->470, 945->938) to the actual query counts; the prior pin was already drifting on this branch before the move.
Move the three tag-related top-level modules into a dedicated package: dojo/tag_inheritance.py -> dojo/tags/inheritance.py dojo/tag_utils.py -> dojo/tags/utils.py dojo/tags_signals.py -> dojo/tags/signals.py Update all internal call sites to the new canonical paths (apps, models, importers, finding, product/helpers, utils_cascade_delete, settings, unittests). No backward-compat shims left at the old paths. Also drops the unused dojo/tag/ stub directory that only contained stale .pyc files. No logic change. All tag inheritance and tag bulk unit tests pass.
Tag inheritance is usually enabled system-wide first, then optionally overridden per product. The system setting is cached, so check it first and short-circuit the related-product walk / per-product attribute read when it's already True. Sites updated: - dojo/tags/inheritance.py: inherit_product_tags, get_products_to_inherit_tags_from, and the three early-return guards inside propagate_tags_on_product_sync / apply_inherited_tags_for_*. - dojo/location/models.py: Location.products_to_inherit_tags_from. - dojo/importers/location_manager.py: bulk-inherit early-return. Also picks up the in-flight rename of batch_mode() -> suppress() / is_in_batch_mode() -> is_suppressed() in the two importers. Behavior unchanged. All tag inheritance / perf / bulk unit tests pass (36 + 20 + 29).
…ance wrapper - is_tag_inheritance_enabled now delegates to get_products_to_inherit_tags_from (no products eligible -> not enabled). Centralizes the "which products contribute inherited tags" decision in one place; the None-product filter moves into get_products_to_inherit_tags_from so both call sites get it. - Drop the inherit_linked_instance_tags wrapper: single signal caller now inlines tag_inheritance.inherit_instance_tags(instance.location). - Trim dead backward-compat re-exports from dojo/tags/signals.py; reroute the test imports to dojo.tags.inheritance directly. Behavior unchanged. test_tag_inheritance (36) and test_tag_inheritance_perf (20) pass.
Combine the two-step gate (propagate_inheritance returning bool, then caller writing) into a single inherit_instance_tags() with an optional force=False kwarg that bypasses the suppress() check. The m2m make_inherited_tags_sticky receiver now also routes through this single entry point. Wrap the inner instance.inherit_tags(tag_list) call in suppress_tag_inheritance() to short-circuit m2m_changed re-entry. The re-entrant signal previously did a redundant in-sync check using a stale get_tag_list() cached value; suppressing it both eliminates the recursion risk and removes per-row redundant queries: - test_baseline_zap_scan_import_v2: 470 -> 463 - test_baseline_zap_scan_import_v3: 938 -> 931 - test_create_one_finding_v2/v3: 64 -> 61 - test_create_100_findings_v2/v3: 4024 -> 3724 - test_finding_add_user_tag_v2/v3: 17 -> 16 - test_finding_remove_inherited_v2/v3: 44 -> 40 Also picks up the rename of the context manager suppress() -> suppress_tag_inheritance(). Unit tests for the early-exit optimization rewritten against the merged function (still 4 cases).
Rewrite the per-row inheritance primitive as a pure diff: current_inherited = obj.inherited_tags.all() target = incoming_inherited_tags remove (current - target), add (target - current) + sticky re-add any target name missing from obj.tags Drops the "rebuild full tag_list, set() everything" approach and removes the `existing_tags_hint` parameter entirely. Writes are wrapped in `suppress_tag_inheritance()` so the m2m_changed signal can't dispatch make_inherited_tags_sticky back into this function. Other cleanup: - Rename `_manage_inherited_tags` -> `_sync_inherited_tags`. - Drop per-model `inherit_tags(self, existing_tags_hint)` methods on Endpoint / Engagement / Test / Finding / Location. The signal path (`inherit_instance_tags`) and `LocationManager._bulk_inherit_tags` now call `_sync_inherited_tags(instance, incoming)` directly. - Drop the unused `existing_tags_by_location` dict in `LocationManager._bulk_inherit_tags` (one fewer through-table read). - Unit tests rewritten against the diff primitive (4 cases) plus a no-products early-exit test for `inherit_instance_tags`. Perf baselines (test_tag_inheritance_perf): EXPECTED_ZAP_IMPORT_V2: 463 -> 422 EXPECTED_ZAP_IMPORT_V3: 931 -> 809 EXPECTED_ZAP_REIMPORT_NO_CHANGE_V3:102 -> 101 EXPECTED_CREATE_ONE_FINDING_V2/V3: 61 -> 55 EXPECTED_CREATE_100_FINDINGS_V2/V3: 3724 -> 3124 (-600) EXPECTED_FINDING_REMOVE_INHERITED_V2/V3: 40 -> 18 EXPECTED_FINDING_ADD_USER_TAG_V2/V3: 16 -> 17 (+1)
…elpers.py - New `apply_inherited_tags_for_locations(locations, *, product)` mirrors the endpoint/finding helpers. Replaces `LocationManager._bulk_inherit_tags`; callsite delegates. Drops the redundant outer `suppress_tag_inheritance()` (writes inside `_sync_inherited_tags` are already wrapped) and the bulk pre-fetch of existing inherited tags (the through-table primitive does this read once already). - `_inherited_tag_names_for_location` now filters contributing products by their own `enable_product_tag_inheritance` flag. Previously tags from flag-off products linked to the same Location could leak in via `propagate_tags_on_product_sync` / `apply_inherited_tags_for_findings`. - Move `propagate_tags_on_product` Celery task into `dojo.tags.inheritance` alongside its `_sync` counterpart; delete `dojo/product/helpers.py`. Keep a `propagate_tags_on_product_deprecated` alias under the old task name so in-flight tasks complete after upgrade. - Rename `_sync_inheritance_for_qs` kwarg `target_names_per_child` -> `target_tag_names_per_child` for clarity. - Update import sites + perf baseline V3 import: 809 -> 445 queries.
…tags, move to signals Renames `inherit_instance_tags` to `auto_inherit_product_tags` and relocates it from `dojo.tags.inheritance` to `dojo.tags.signals`. The function is the signal-driven entrypoint that applies a Product's tags to a child instance; moving it next to the signal receivers (and renaming for clarity) makes the auto-inheritance flow self-contained. Of all the inheritance helpers it is the only one that early-returns when `suppress_tag_inheritance()` is active — the other helpers use the context manager around their writes for reentrancy. That gate semantic is exclusive to the signal path, so colocation fits. Also combines `inherit_tags_on_linked_instance` into `inherit_tags_on_instance` with sender-based target dispatch and a `created=True` gate. Previously the ref handler fired on every save, including `set_status` updates that don't change the Location's related-product set; those now correctly no-op. Drops the unused `force=` parameter on the function while renaming.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wraps the import / reimport hot loop in
tag_inheritance.suppress_tag_inheritance()and bulk-applies inherited Product tags per batch, beforepost_process_findings_batchdispatches, so rules / deduplication see inherited tags onfinding.tags. Pairs with the per-batch import-tag work landed in #14839.This is the Phase B Stage 2 follow-up to #14811 / #14812 / #14827.
Changes
process_findingsinDefaultImporterandDefaultReImporternow runs its finding-creation loop insidetag_inheritance.suppress_tag_inheritance(). Per batch — afterapply_import_tags_for_batchand beforedojo_dispatch_task(post_process_findings_batch, ...)— calls a new helperapply_inherited_tags_for_findings(batch_findings)that bulk-syncs inherited tags on the batch's Findings + the Endpoints (V2) / Locations (V3) reachable from them via FK chain.inherit_instance_tags→auto_inherit_product_tagsand moved fromdojo.tags.inheritancetodojo.tags.signals, colocated with the receivers that call it. It is the only inheritance helper that usesis_suppressed()as a gate (others usesuppress_tag_inheritance()to wrap their writes), so the gate semantic is exclusive to the signal path.post_savehandlers for child models (Endpoint/Engagement/Test/Finding/Location) and ref models (LocationFindingReference/LocationProductReference) are combined into a singleinherit_tags_on_instancehandler with sender-based target dispatch +created=Truegate. Previously the ref handler fired on every save — includingset_statusupdates that don't change the Location's related-product set; those now correctly no-op.EndpointManager.get_or_create_endpointsreplaces its per-rowinherit_instance_tags(ep)cleanup loop with a singleapply_inherited_tags_for_endpoints(created)bulk call. Inside the importer the per-batch helper already covers these endpoints viaEndpoint.status_endpoint.finding; the bulk call is retained as a defensive hook for non-importer callers.apply_inherited_tags_for_locations(locations, *, product)mirrors the endpoint / finding helpers.LocationManager._bulk_inherit_tagsis removed; the persist path delegates to the new helper. Drops a redundant outersuppress_tag_inheritance()(writes inside_sync_inherited_tagsare already wrapped) and a redundant bulk pre-fetch of existing inherited tags (the through-table primitive does that read once already)._inherited_tag_names_for_location(the per-Location callback for_sync_inheritance_for_qs) now filters contributing products by their ownenable_product_tag_inheritanceflag. Previously, tags from flag-off products linked to the same Location could leak in viapropagate_tags_on_product_syncandapply_inherited_tags_for_findings.propagate_tags_on_product_sync(used by the product-tag-toggle Celery task) gains an early-exit when neither system-wide nor per-product inheritance is enabled, eliminating ~9 wasted reads per call on inheritance-off products. State transitions (toggling either flag) still trigger a full sweep via their existing signal handlers.Locationgainsiter_related_products()— a related-manager (self.products+self.findings) implementation returninglist[Product]. Callers that pre-issueprefetch_related("products__product__tags", "findings__finding__test__engagement__product__tags")get zero extra queries per Location. The existing JOIN'dall_related_products()is kept for the per-instance signal path where prefetching is not possible.dojo.tags.inheritance. Move thepropagate_tags_on_productCelery task into that module alongside its_synccounterpart and deletedojo/product/helpers.py. Apropagate_tags_on_product_deprecatedalias is kept under the old task name so in-flight tasks complete after upgrade._sync_inheritance_for_qskwargtarget_names_per_child→target_tag_names_per_childfor clarity.Query-count impact
Pins in
unittests/test_tag_inheritance_perf.pyupdated to match. "Before" numbers are the pre-Phase-A baselines recorded in source comments next to each constant.