Fix: keep sarek `status` column when all samples are normal (status=0) by Osamaali313 · Pull Request #323 · anthropics/knowledge-work-plugins

Osamaali313 · 2026-06-14T15:43:33Z

Problem

generate_samplesheet.py can silently drop the required status column from a generated nf-core/sarek samplesheet whenever every sample is normal.

In _write_samplesheet, output columns are selected by truthiness:

active_columns = [c for c in column_names if any(c in row and row[c] for row in rows)]

The sarek status column is an integer where 0 = normal, 1 = tumor (config/pipelines/sarek.yaml, described as "critical for somatic calling"). Because 0 is falsy, a cohort in which all samples are normal makes any(... and row[c] ...) evaluate to False, so the status column is omitted from the written CSV.

This is easy to hit in practice:

Germline-only runs, where every sample is legitimately status=0.
--no-interactive mode: any sample whose name lacks a tumor/normal keyword defaults to status=0 (_process_sarek_samples), so a directory of plainly-named FASTQs yields an all-normal cohort and loses the column entirely.

The bug is masked by validation: validate_samplesheet runs on the in-memory row dicts (which do contain status), not on the written file — so it reports the samplesheet as valid while the emitted CSV is missing a required column.

Reproduction

rows = [
    {"patient": "P1", "sample": "P1_blood", "fastq_1": ".../P1_R1.fastq.gz", "fastq_2": ".../P1_R2.fastq.gz", "status": 0},
    {"patient": "P2", "sample": "P2_blood", "fastq_1": ".../P2_R1.fastq.gz", "fastq_2": ".../P2_R2.fastq.gz", "status": 0},
]
# before: header = patient,sample,fastq_1,fastq_2      <-- status dropped
# after:  header = patient,sample,fastq_1,fastq_2,status

A mixed cohort (a status=1 present) was unaffected, which is why this slipped through.

Fix

Select columns by explicit presence (value is not None and value != "") instead of truthiness, so valid falsy values are preserved while genuinely-empty columns are still dropped.

Verification

All-normal cohort → status retained ✅ (previously dropped)
Tumor-only cohort (all status=1) → status retained ✅
Mixed cohort → unchanged ✅
Single-end data (fastq_2 all empty) → fastq_2 still correctly dropped ✅

(The repo has no unit-test harness or pytest config, and CI does not run script tests, so no test file is added — the change is a minimal one-function fix.)

_write_samplesheet selected output columns by truthiness (`any(c in row and row[c] ...)`), which treats the valid value 0 as empty. For nf-core/sarek the `status` column is 0=normal / 1=tumor, so an all-normal cohort - common for germline runs, and the guaranteed result of `--no-interactive` when sample names lack tumor/normal keywords (every sample then defaults to status=0) - wrote a samplesheet with the required `status` column silently dropped. The in-memory validation passes because it runs on the row dicts (which contain status), not on the written CSV, so the problem was masked. Use an explicit presence check (value is not None and not "") so valid falsy values are preserved while genuinely-empty columns are still dropped. Verified: all-normal and tumor-only cohorts now retain status; empty columns (e.g. single-end fastq_2) are still omitted.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates samplesheet column filtering so columns with valid falsy values (e.g., 0) are retained rather than being dropped as “empty”.

Changes:

Replaces truthiness-based filtering with an explicit “has value” check for samplesheet columns.
Adds an inline helper (_has_value) to define what counts as “present” data.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    # Filter to columns that have data. Use an explicit presence check rather
+    # than truthiness so that valid falsy values are not treated as empty -
+    # notably sarek's `status` column where 0 means "normal" (an all-normal
+    # cohort would otherwise drop the required status column entirely).
+    def _has_value(v):
+        return v is not None and v != ""


Copilot AI review requested due to automatic review settings June 14, 2026 15:43

Copilot AI reviewed Jun 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: keep sarek `status` column when all samples are normal (status=0)#323

Fix: keep sarek `status` column when all samples are normal (status=0)#323
Osamaali313 wants to merge 1 commit into
anthropics:mainfrom
Osamaali313:fix/samplesheet-status-column-zero

Osamaali313 commented Jun 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Osamaali313 commented Jun 14, 2026

Problem

Reproduction

Fix

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants