Skip to content

Store calo SimParticle ancestor chain in SimInfo::ancestorSimIds#366

Open
zwl0331 wants to merge 1 commit intoMu2e:mainfrom
zwl0331:add-ancestor-sim-ids
Open

Store calo SimParticle ancestor chain in SimInfo::ancestorSimIds#366
zwl0331 wants to merge 1 commit intoMu2e:mainfrom
zwl0331:add-ancestor-sim-ids

Conversation

@zwl0331
Copy link
Copy Markdown

@zwl0331 zwl0331 commented May 4, 2026

Add a std::vector ancestorSimIds field to SimInfo and populate it for calorimeter SimInfos in fillCaloSimInfos. After fillSimInfo returns, walk edep.sim()->parent() until hasParent() is false (or the parent pointer is null in compressed events) and push each ancestor's id() into the vector. The vector lists ancestors from the immediate parent up to the root: [parentId, grandparentId, ...].

Motivation. The existing prirel / trkrel MCRelationships are populated for tracker SimInfos but not for calorimeter SimInfos, and they only capture the relationship to the primary particle anyway. For calorimeter showering analyses we need the full Geant4 parent chain so each hit can be grouped under its calo-entrant ancestor (the highest ancestor that also deposited in the same disk). This recovers true shower membership for downstream truth labelling: a primary electron's secondary photons (eBrem dominates at ~50% process share) deposit independently in adjacent crystals and otherwise look like independent truth clusters under the existing "dominant SimParticle per crystal" rule.

No breaking change. Default-initialised empty vector for any SimInfo that fillCaloSimInfos doesn't touch (i.e. all tracker SimInfos); ROOT serialises std::vector inside the already-registered SimInfo automatically, no dictionary entry needed.

Validated by reprocessing 50 MDC2025af MCS art files via FermiGrid: zero broken chains over 9,000 events / 203,971 SimParticles, mean chain length 1.90 (median 1, max 14). Switching a downstream GNN calorimeter-clustering analysis to use "calo-entrant ancestor" truth via this field cut merge errors in half and lifted truth match rate by +6.2 pp on the val split with no model retraining.

Add a std::vector<int> ancestorSimIds field to SimInfo and populate
it for calorimeter SimInfos in fillCaloSimInfos. After fillSimInfo
returns, walk edep.sim()->parent() until hasParent() is false (or
the parent pointer is null in compressed events) and push each
ancestor's id() into the vector. The vector lists ancestors from the
immediate parent up to the root: [parentId, grandparentId, ...].

Motivation. The existing prirel / trkrel MCRelationships are
populated for tracker SimInfos but not for calorimeter SimInfos, and
they only capture the relationship to the primary particle anyway.
For calorimeter showering analyses we need the full Geant4 parent
chain so each hit can be grouped under its calo-entrant ancestor
(the highest ancestor that also deposited in the same disk). This
recovers true shower membership for downstream truth labelling: a
primary electron's secondary photons (eBrem dominates at ~50%
process share) deposit independently in adjacent crystals and
otherwise look like independent truth clusters under the existing
"dominant SimParticle per crystal" rule.

No breaking change. Default-initialised empty vector for any SimInfo
that fillCaloSimInfos doesn't touch (i.e. all tracker SimInfos);
ROOT serialises std::vector<int> inside the already-registered
SimInfo automatically, no dictionary entry needed.

Validated by reprocessing 50 MDC2025af MCS art files via FermiGrid:
zero broken chains over 9,000 events / 203,971 SimParticles, mean
chain length 1.90 (median 1, max 14). Switching a downstream GNN
calorimeter-clustering analysis to use "calo-entrant ancestor"
truth via this field cut merge errors in half and lifted truth
match rate by +6.2 pp on the val split with no model retraining.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant