Improve BORIS/CowLog compatibility: parsing, schema tolerance, timecodes, and exports#40
Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughSummary by CodeRabbit
WalkthroughEnhanced compatibility support for BORIS and CowLog formats across multiple schema versions. Core logic now handles dict/list normalization for observations, temporal parsing accepts multiple timecode formats (HH:MM:SS:FF, ISO-8601, frame-based), and import/export processes merge multi-observation data with frame-rate awareness and metadata preservation. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 0 |
| Duplication | -1 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
|
There was a problem hiding this comment.
Code Review
This pull request significantly enhances BORIS and CowLog compatibility by introducing support for multi-observation merging, advanced timecode parsing (ISO8601, SMPTE, and frame-based), and improved metadata handling. Key updates include an overhauled _decimal utility for complex time conversions, auto-detection of delimiters in tabular files, and the inclusion of annotations and observer data in CowLog exports. Review feedback focuses on preventing potential data loss when appending notes, improving the robustness of metadata parsing to handle various colon placements, and ensuring consistent FPS variable identification across the codebase.
I am having trouble creating individual review comments. Click here to see my feedback.
tracker/views.py (676-686)
The _append_note_line function can cause unintended data loss. If the existing notes already exceed max_length, they will be truncated even if the candidate line is empty or already present in the notes. Additionally, if ObservationSession.notes is a TextField (which typically has no length limit in PostgreSQL), an arbitrary 2000-character limit might be unnecessary. It is safer to avoid truncation of existing content and only apply limits if strictly required by the database schema.
def _append_note_line(existing: str | None, line: str, *, max_length: int = 2000) -> str:
"""Append one note line while avoiding duplicate entries."""
current = (existing or '').strip()
candidate = (line or '').strip()
if not candidate:
return current
lines = [item.strip() for item in current.splitlines() if item.strip()]
if candidate in lines:
return current
lines.append(candidate)
result = '\n'.join(lines)
return result[:max_length] if max_length else result
tracker/views.py (2141-2144)
The metadata parsing logic in parse_cowlog_results_text is fragile. It fails to correctly identify annotations or metadata keys if a colon is attached to the keyword in a multi-token line (e.g., # note:10.5 Marker or # fps: 30). Additionally, metadata values might retain a leading colon if it was parsed as a separate token (e.g., # session : My Session). A more robust approach would be to normalize the first token by stripping the trailing colon and handling an optional colon as the second token.
metadata_parts = [part.strip() for part in raw_metadata_parts if part.strip()]
if metadata_parts:
if ':' in metadata_parts[0]:
key, first_val = metadata_parts[0].split(':', 1)
metadata_parts[0] = key.strip()
if first_val.strip():
metadata_parts.insert(1, first_val.strip())
elif len(metadata_parts) > 1 and metadata_parts[1] == ':':
metadata_parts.pop(1)tracker/views.py (3632-3639)
The logic for identifying the FPS variable definition is inconsistent with the export logic at line 5838. The current implementation only replaces spaces with underscores, which might miss variants like frame-rate. Using a case-insensitive regex similar to the one used in session_export_cowlog_txt would be more robust and maintainable.
fps_definition = next(
(
definition
for label, definition in variable_map.items()
if re.fullmatch(r'^(fps|frame[_ -]?rate|framerate)$', label, re.I)
),
None,
)
💡 Codex ReviewPyBehaviorLog/tracker/views.py Line 2515 in 058597b The new PyBehaviorLog/tracker/views.py Line 686 in 058597b The new note helper silently slices notes to 2000 characters on every append, so importing BORIS/CowLog metadata into a session with longer existing notes will irreversibly drop note content. Since ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |



Motivation
Description
plan.mdwith an implementation roadmap for BORIS/CowLog compatibility work._schema_matches,_is_supported_session_schema,_is_supported_observation_schema,_is_supported_project_schema,_is_supported_ethogram_schema) and accepting newer CowLog/BORIS variants (e.g.cowlog-results-v2,boris-observation-v4,boris-project-v4)._coerce_object_rowsand updated resolution helpers to accept both lists and mappings when readingevents,annotations,segments, andobservations._decimalto support ISO8601 durations, SMPTE-like semicolon timecodes, HH:MM:SS:FF frame tokens using a configurable frame rate, and improved comma decimal handling._normalize_frame_rate_tokento parse numeric, ratio (30000/1001) and unit-bearing frame-rate tokens and used it when converting frame indices/timecodes.parse_cowlog_results_text) to extract metadata lines, annotations from#metadata, detect frame-rate headers, accept semicolon-separated rows and quoted tokens (viashlexfallback), and surface metadata in the returned payload/report.parse_tabular_session_rows/parse_tabular_session_file) to autodetect semicolon CSV, parse frame/time formats, apply per-rowframe_rateorfpstokens, and supportdurationcolumns to compute stop times.session.notesusing_append_note_lineand by adding comments or prefixed titles to merged events/annotations.ObservationVariableValuewhen a matching variable definition exists.session_export_cowlog_txtnow includes# observer,# fpswhen available, and emits# annotationlines for session annotations; export header keeps compatibility notice.tracker/tests/*to cover new parsing cases, schemas, frame-rate variants, metadata annotations, mapping rows, multi-observation merges, and roundtrip equivalence.Testing
tracker/tests/includingtest_compatibility.py,test_helpers.py, andtest_roundtrip.py, which exercise CowLog text parsing, tabular parsing, session/project/ethogram imports, merges, and exports, and all assertions passed.session_export_cowlog_txtand compatibility report generation via tests that validate presence of# fps,# observer, and# annotationlines, which succeeded.Codex Task