Closed
Conversation
Contributor
Author
|
I am closing this request because I discovered the root cause of this bug and the average bout length bug originate from the same problematic cumulative sum error. I will submit a new pull request on this branch momentarily that addresses both bugs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: Latency First/Last Prediction Cumulative Sum Bug
Summary
support_code/behavior_summaries.py
groupby().sum(), the latency values are now extracted from the per-binfiltered_datausingagg(lambda s: s.iloc[0]) and agg(lambda s: s.iloc[-1]). These return a Series indexed by MouseID, preserve NaN, and do not sum across bins..head(1)/.tail(1)calls (which operated on already-summed, post-groupby data) are replaced with direct assignment from these pre-computed Series.tests/support_code/test_behavior_summaries.py — 7 new tests covering:
Context
bin_first_XX.{behavior}_latency_first_predictionandbin_last_XX.{behavior}_latency_last_predictionin the output feature CSVs are wrong. Instead of reporting the latency to the first/last prediction within the analysis window, they report a cumulative sum of per-bin latency values. For example, for the video041345_B6J_M_42462_trimmed.avi:bin_first_15.Jumping_latency_first_prediction= 30000 (16.67 min) — outside the 0–15 min windowbin_first_60.Jumping_latency_first_prediction= 256925 (142.7 min) — far outside the 0–60 min windowThe bug is in
support_code/behavior_summaries.py.Root Cause
In
support_code/behavior_summaries.py,aggregate_data_by_bin_size()(line 117):Line 136 sums ALL numeric columns per MouseID, including the latency columns:
This collapses
filtered_datato one row per MouseID with latency values summed across bins (e.g., bins 0–5, 5–10, 10–15 each have a latency, and they get added together).Lines 183–188 then try to extract first/last from the already-collapsed result:
After the groupby, there is one row per MouseID.
.head(1)returns only the first mouse's summed value — for a single-mouse run this silently produces the wrong (summed) number; for multi-mouse runs it also produces NaN for all mice except the first/last.What the Values Should Be
The per-bin
latency_to_first_predictionvalues are absolute frame numbers from the start of the video (not relative to the bin start), so:latency_first_predictionfor a window = the first non-NaNlatency_to_first_predictionacross all bins in that window per MouseIDlatency_last_predictionfor a window = the last non-NaNlatency_to_last_predictionacross all bins in that window per MouseIDFix
File:
support_code/behavior_summaries.pyIn
aggregate_data_by_bin_size(), before thegroupby().sum()on line 136, extract the latency values from the per-binfiltered_datausingnth(), which preserves NaN (unlikefirst()/last()which skip NaN and would carry a prior-bin value forward):nth(0)— returns the first bin's value per MouseID; if that bin has no behavior the result is NaNnth(-1)— returns the last bin's value per MouseID; if that bin has no behavior the result is NaN, rather than falling back to a previous bin's valueThen replace lines 183–188 with:
Since both
aggregatedandlatency_first/latency_lastare indexed by MouseID after the respective groupby operations, pandas will align them correctly for all mice.Verification
No existing tests cover
behavior_summaries.py. After the fix, add a pytest test intests/(orsupport_code/tests/) that:aggregate_data_by_bin_size()withbin_size=4latency_first_prediction(first bin's value):latency_last_prediction(last bin's value):