Improve CowLog/BORIS/pybehavior compatibility: robust parsing, schema support, and annotation export#36
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 4 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c6345e12e9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| resolved_kind = _resolve_event_kind_token(lowered) | ||
| if resolved_kind is not None: | ||
| event_kind = resolved_kind | ||
| state_marker_used = True |
There was a problem hiding this comment.
Preserve modifier/subject tokens before event-kind aliases
This branch now classifies any token matching _resolve_event_kind_token before checking modifiers or subjects, and the new alias set includes common values like 1, 2, on, and off. In CowLog rows, those tokens can legitimately be modifier names or subject IDs, so they get consumed as event markers, silently dropping modifier/subject data and potentially changing state semantics during import.
Useful? React with 👍 / 👎.
| def _schema_matches(value: str | None, pattern: str) -> bool: | ||
| return bool(value and re.fullmatch(pattern, value)) |
There was a problem hiding this comment.
Reject non-string schema values safely
The new schema matcher calls re.fullmatch on any truthy schema value, which raises TypeError for non-strings (for example, malformed JSON with a numeric schema). Because session/project/ethogram import validation now routes through this helper, those payloads produce server errors instead of the expected unsupported-format validation path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request significantly enhances the system's compatibility with CowLog and BORIS data formats by improving the parsing of complex timecodes, including HH:MM:SS:FF and ISO 8601 durations. It introduces the ability to merge multiple observations during import, handles metadata more comprehensively, and ensures that annotations are included in CowLog exports. Additionally, the internal data resolution logic was updated to support both list and dictionary-based payloads. Review feedback suggests making the timecode normalization more robust and ensuring consistent decimal separator handling for frame-based timestamps.
| if re.fullmatch(r'\d{1,3}:\d{1,2}:\d{1,2};\d{1,3}', token): | ||
| token = token.replace(';', ':') |
There was a problem hiding this comment.
The current regex for semicolon-separated timecodes is quite restrictive as it only matches the 4-part HH:MM:SS;FF format. It would be more robust to normalize semicolons to colons for any timecode-like string (e.g., MM:SS;FF) before splitting, provided it doesn't conflict with ISO 8601 duration markers.
| if re.fullmatch(r'\d{1,3}:\d{1,2}:\d{1,2};\d{1,3}', token): | |
| token = token.replace(';', ':') | |
| if ';' in token and not token.startswith(('P', 'p')): | |
| token = token.replace(';', ':') |
| hours = Decimal(parts[0]) | ||
| minutes = Decimal(parts[1]) | ||
| seconds = Decimal(parts[2].replace(',', '.')) | ||
| frames = Decimal(parts[3]) |
There was a problem hiding this comment.



Motivation
Description
_schema_matches,_is_supported_*helpers and_coerce_object_rows, and updated code paths to accept mapping or list forms forevents,annotations,segments, andobservationsacrossimport_session_payload,import_project_payload, and related flows._decimalimplementation to parse ISO8601 (PT...S),HH:MM:SS:FFframe timecodes (honoringframe_rate), SMPTE semicolon variants, and comma/locale decimals, and threadedframe_ratethrough parsing where relevant.parse_cowlog_results_text,parse_tabular_session_rows) to detect metadata (headers), extract annotations from metadata/rows, support semicolon-delimited rows,shlextokenization, duration columns for state intervals, per-row/custom FPS, and better warnings.session.notesvia_append_note_line.session_export_cowlog_txt._resolve_event_kind_tokenmappings and added_coerce_name_list/_normalize helpers for more robust name-list coercion.Testing
tracker/tests/including updated and new tests intest_compatibility.py,test_helpers.py, andtest_roundtrip.py, which exercise CowLog parsing, BORIS/tabular imports, project/ethogram imports, merged observations, timecode formats, and CowLog export annotations.cowlog-results-v2, CowLog timecode variants and metadata annotations, tabular frame/timecode/duration handling, mapping-style BORIS payloads and merged observations, and roundtrip normalization; all tests passed locally.Codex Task