Conversation
zm711
left a comment
There was a problem hiding this comment.
got interrupted I'll try to look over this more later.
| "amplitude_cutoff": {"greater": None, "less": 0.2}, | ||
| "num_spikes": {"greater": 300, "less": None}, | ||
| "rp_contamination": {"greater": None, "less": 0.1}, | ||
| "rpv": {"greater": None, "less": 0.1}, # applies to rp_contamination or sliding_rp_violation |
There was a problem hiding this comment.
I wonder if it is better to keep this explicit for users? Since our qc possibilities are rp_contamination or sliding_rp_violation? Not sure...
There was a problem hiding this comment.
I agree with @zm711. In addition, thresholds could also be different.
To be extra explicit, I recommend doing something like this (I think we can remove all the Nones!):
"sliding_rp_violation": {"less": 0.1, "fallback": "rp_contamination"},
"rp_contamination": {"less": 0.1}
Internally, maybe directly in the threshold_metric function, we can have the logic that:
- looks for metrics with a fallback
- if present, use it and delete fallback entry
- if not present, use fallback entry
What do you guys think?
There was a problem hiding this comment.
Thanks both!
The reason I set it like this, is because I am not sure it makes sense to use two different RPV calculation methods. In my mind, you should choose one method and then apply the threshold. But happy to change, lmk what you think
edit: Here's the fix:
-
Dropped the "rpv" key. Users now put the explicit column name under thresholds["mua"] — either "sliding_rp_violation" or "rp_contamination" — and that single entry selects both which RPV metric gets computed and its threshold. The key is the method, just like every other metric in the dict. I think this makes the most sense.
-
Default uses "sliding_rp_violation".
-
Validation: exactly one of the two must be present; we raise a clear error if the user specifies both (ambiguous) or neither (no RPV check).
-
Documented in bombcell_get_default_thresholds, bombcell_label_units, and run_bombcell_qc docstrings.
| use_valid_periods: bool = False, | ||
| valid_periods_params: dict | None = None, | ||
| recompute_quality_metrics: bool = True, |
There was a problem hiding this comment.
I think these should not be taken care of by the bombcell function.
The quality metrics compute function already allows users to use the valid_periods if available. With the same rationale of being as explicit as possible and trying to minimize "hidden magin" (I know it's not so hidden, but pass me the term), I would encourage users to explicitly compute valid_unit_periods and metrics using these periods as part of their postprocessing, prior to using bombcell. So I would add them to the how to page, with extensive comments on why this is advisable.
There was a problem hiding this comment.
As we discussed, bombcell_label_units is now a pure labeler with no extension computation etc.
run_bombcell_qc keeps one valid-periods knob: params["compute_valid_periods"] (bool). When on, the pipeline computes valid_unit_periods with defaults if missing, then computes quality_metrics with use_valid_periods=True. Dropped the auto-derivation of fp_threshold/fn_threshold from bombcell thresholds; instead we warn if a pre-existing valid_unit_periods extension has fp/fn that disagree with the bombcell RPV / amplitude_cutoff thresholds.
alejoe91
left a comment
There was a problem hiding this comment.
Thanks @Julie-Fabre
I have several comments/suggestions.
My main comment is about keeping the valid periods/QC computation out of the bombcell function and in the bombcell/curation docs.
Can you make a separate PR about the pipeline functions?
Happy to have a call if you want!
for more information, see https://pre-commit.ci
…pikeinterface into bombcell-extension
Co-authored-by: Alessio Buccino <alejoe9187@gmail.com>
…pikeinterface into bombcell-extension
for more information, see https://pre-commit.ci
Changes to expose and enable all parameters available in native bombcell: