Skip to content

Improve BORIS/CowLog compatibility: parsing, schema tolerance, timecodes, and exports#40

Merged
Smartappli merged 2 commits intomainfrom
codex/adapter-code-pour-compatibilite-avec-boris-et-cowlog
May 1, 2026
Merged

Improve BORIS/CowLog compatibility: parsing, schema tolerance, timecodes, and exports#40
Smartappli merged 2 commits intomainfrom
codex/adapter-code-pour-compatibilite-avec-boris-et-cowlog

Conversation

@Smartappli
Copy link
Copy Markdown
Owner

Motivation

  • Broaden and stabilize import/export interoperability with BORIS and CowLog payloads by handling newer schemas, mapping-style rows, and richer metadata.
  • Improve timecode parsing (ISO durations, SMPTE/frame timecodes, decimal and comma decimals) and make frame-rate resolution explicit and traceable.
  • Preserve multi-observation context and metadata when merging BORIS observations into sessions and include CowLog metadata/annotations in exports.

Description

  • Added plan.md with an implementation roadmap for BORIS/CowLog compatibility work.
  • Extended schema recognition and tolerance by adding schema helper functions (_schema_matches, _is_supported_session_schema, _is_supported_observation_schema, _is_supported_project_schema, _is_supported_ethogram_schema) and accepting newer CowLog/BORIS variants (e.g. cowlog-results-v2, boris-observation-v4, boris-project-v4).
  • Normalized object/mapping-style rows via new helper _coerce_object_rows and updated resolution helpers to accept both lists and mappings when reading events, annotations, segments, and observations.
  • Enhanced time parsing with _decimal to support ISO8601 durations, SMPTE-like semicolon timecodes, HH:MM:SS:FF frame tokens using a configurable frame rate, and improved comma decimal handling.
  • Added _normalize_frame_rate_token to parse numeric, ratio (30000/1001) and unit-bearing frame-rate tokens and used it when converting frame indices/timecodes.
  • Improved CowLog plain-text parser (parse_cowlog_results_text) to extract metadata lines, annotations from # metadata, detect frame-rate headers, accept semicolon-separated rows and quoted tokens (via shlex fallback), and surface metadata in the returned payload/report.
  • Improved tabular import (parse_tabular_session_rows / parse_tabular_session_file) to autodetect semicolon CSV, parse frame/time formats, apply per-row frame_rate or fps tokens, and support duration columns to compute stop times.
  • Merge multiple BORIS observations into a single session while preserving observation labels and media labels by appending unique import notes to session.notes using _append_note_line and by adding comments or prefixed titles to merged events/annotations.
  • Import logic now writes CowLog metadata into session notes and stores FPS metadata into ObservationVariableValue when a matching variable definition exists.
  • Export: session_export_cowlog_txt now includes # observer, # fps when available, and emits # annotation lines for session annotations; export header keeps compatibility notice.
  • Added and updated many tests in tracker/tests/* to cover new parsing cases, schemas, frame-rate variants, metadata annotations, mapping rows, multi-observation merges, and roundtrip equivalence.

Testing

  • Ran the updated unit tests under tracker/tests/ including test_compatibility.py, test_helpers.py, and test_roundtrip.py, which exercise CowLog text parsing, tabular parsing, session/project/ethogram imports, merges, and exports, and all assertions passed.
  • Exercised session_export_cowlog_txt and compatibility report generation via tests that validate presence of # fps, # observer, and # annotation lines, which succeeded.

Codex Task

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Enhanced compatibility with BORIS and CowLog formats, including support for additional schema versions
    • Added support for multiple timecode formats (clock with milliseconds, ISO-8601 durations, frame-based timecodes)
    • Improved metadata and frame-rate preservation during import and export operations
    • Enhanced round-trip data integrity validation

Walkthrough

Enhanced compatibility support for BORIS and CowLog formats across multiple schema versions. Core logic now handles dict/list normalization for observations, temporal parsing accepts multiple timecode formats (HH:MM:SS:FF, ISO-8601, frame-based), and import/export processes merge multi-observation data with frame-rate awareness and metadata preservation.

Changes

Cohort / File(s) Summary
Documentation & Roadmap
plan.md
New structured compatibility roadmap defining work items for BORIS/CowLog fidelity, schema normalization, temporal parsing with SMPTE support, integrity validation, roundtrip preservation, and UX requirements for pre-import preview and manual correction.
Core Compatibility Logic
tracker/compatibility.py, tracker/views.py
Schema version expansion (cowlog-results-v2, boris-observation-v4) with dict/list normalization for events/annotations/segments; enhanced time parsing (HH:MM:SS:FF, ISO-8601, frame timecodes, comma decimals) with FPS-aware conversion; CowLog metadata extraction (observer, FPS, annotations, notes) into structured fields; multi-observation merging with prefixed event labels; tabular session parsing with duration-to-stop-time derivation; export enhancements including FPS/observer headers and annotation output.
Compatibility Test Suite
tracker/tests/test_compatibility.py, tracker/tests/test_helpers.py, tracker/tests/test_roundtrip.py
Expanded coverage for CowLog text parsing (multiple timecode formats, metadata annotations, header fields, FPS variants), tabular CSV import (Boris-style, custom FPS columns), multi-observation merging semantics, schema version acceptance (cowlog-results-v2, boris-observation-v4, boris-project-v4), roundtrip normalization of mapping-based vs list-based observation structures, and independent-variable deduplication on re-import.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Formats dance and timecodes align,
Frame rates parsed with precision divine,
BORIS meets CowLog in harmonious blend,
Observations merge from start to the end,
Metadata preserved, round-trip pristine—
The finest compatibility ever seen!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main changes: BORIS/CowLog compatibility improvements including parsing, schema tolerance, timecodes, and exports, which directly align with the changeset.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, covering motivation, detailed implementation changes, testing methodology, and specific code modifications across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/adapter-code-pour-compatibilite-avec-boris-et-cowlog
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch codex/adapter-code-pour-compatibilite-avec-boris-et-cowlog

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 1, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity · -1 duplication

Metric Results
Complexity 0
Duplication -1

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@Smartappli Smartappli merged commit a290e6c into main May 1, 2026
6 of 27 checks passed
@Smartappli Smartappli deleted the codex/adapter-code-pour-compatibilite-avec-boris-et-cowlog branch May 1, 2026 11:36
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 1, 2026

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances BORIS and CowLog compatibility by introducing support for multi-observation merging, advanced timecode parsing (ISO8601, SMPTE, and frame-based), and improved metadata handling. Key updates include an overhauled _decimal utility for complex time conversions, auto-detection of delimiters in tabular files, and the inclusion of annotations and observer data in CowLog exports. Review feedback focuses on preventing potential data loss when appending notes, improving the robustness of metadata parsing to handle various colon placements, and ensuring consistent FPS variable identification across the codebase.

I am having trouble creating individual review comments. Click here to see my feedback.

tracker/views.py (676-686)

high

The _append_note_line function can cause unintended data loss. If the existing notes already exceed max_length, they will be truncated even if the candidate line is empty or already present in the notes. Additionally, if ObservationSession.notes is a TextField (which typically has no length limit in PostgreSQL), an arbitrary 2000-character limit might be unnecessary. It is safer to avoid truncation of existing content and only apply limits if strictly required by the database schema.

def _append_note_line(existing: str | None, line: str, *, max_length: int = 2000) -> str:
    """Append one note line while avoiding duplicate entries."""
    current = (existing or '').strip()
    candidate = (line or '').strip()
    if not candidate:
        return current
    lines = [item.strip() for item in current.splitlines() if item.strip()]
    if candidate in lines:
        return current
    lines.append(candidate)
    result = '\n'.join(lines)
    return result[:max_length] if max_length else result

tracker/views.py (2141-2144)

medium

The metadata parsing logic in parse_cowlog_results_text is fragile. It fails to correctly identify annotations or metadata keys if a colon is attached to the keyword in a multi-token line (e.g., # note:10.5 Marker or # fps: 30). Additionally, metadata values might retain a leading colon if it was parsed as a separate token (e.g., # session : My Session). A more robust approach would be to normalize the first token by stripping the trailing colon and handling an optional colon as the second token.

                metadata_parts = [part.strip() for part in raw_metadata_parts if part.strip()]
                if metadata_parts:
                    if ':' in metadata_parts[0]:
                        key, first_val = metadata_parts[0].split(':', 1)
                        metadata_parts[0] = key.strip()
                        if first_val.strip():
                            metadata_parts.insert(1, first_val.strip())
                    elif len(metadata_parts) > 1 and metadata_parts[1] == ':':
                        metadata_parts.pop(1)

tracker/views.py (3632-3639)

medium

The logic for identifying the FPS variable definition is inconsistent with the export logic at line 5838. The current implementation only replaces spaces with underscores, which might miss variants like frame-rate. Using a case-insensitive regex similar to the one used in session_export_cowlog_txt would be more robust and maintainable.

            fps_definition = next(
                (
                    definition
                    for label, definition in variable_map.items()
                    if re.fullmatch(r'^(fps|frame[_ -]?rate|framerate)$', label, re.I)
                ),
                None,
            )

@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

if ',' in first_line or ';' in first_line or ' ' in first_line:

P1 Badge Route semicolon-delimited .txt imports through CowLog parser

The new ';' in first_line check causes semicolon-delimited CowLog text files (for example 2.0;Stand;begin;Cow 1) to be misclassified as tabular input, so parse_tabular_session_file treats the first event row as a header and imports zero events. This breaks the new semicolon CowLog compatibility path specifically for .txt uploads whose first token is not parseable by float().


return '\n'.join(lines)[:max_length]

P1 Badge Avoid truncating existing session notes during import

The new note helper silently slices notes to 2000 characters on every append, so importing BORIS/CowLog metadata into a session with longer existing notes will irreversibly drop note content. Since ObservationSession.notes is a TextField, this introduces unexpected data loss that did not exist before this import-path change.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant