Skip to content

Add a logger#680

Open
Intron7 wants to merge 2 commits into
mainfrom
add-a-logger
Open

Add a logger#680
Intron7 wants to merge 2 commits into
mainfrom
add-a-logger

Conversation

@Intron7
Copy link
Copy Markdown
Member

@Intron7 Intron7 commented May 28, 2026

This adds a scanpy style logger to rsc.

@Intron7
Copy link
Copy Markdown
Member Author

Intron7 commented May 28, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Logging infrastructure with five configurable verbosity levels for operation transparency
    • Settings system enabling package-level configuration and verbosity control
    • Progress tracking with elapsed timing information across preprocessing and analysis functions

Walkthrough

This PR introduces a scverse-style logging system for rapids-singlecell with Pydantic settings-based verbosity control. It adds a new logging.py module with custom HINT log level, timestamp-aware message formatting, and five exported convenience functions, along with a Verbosity enum and Settings model for configuration. The logging infrastructure is then instrumented throughout preprocessing and tools workflows with start/finish markers and optional metadata, and comprehensive tests validate the system's behavior.

Changes

Logging infrastructure and instrumentation

Layer / File(s) Summary
Core logging module and public API
src/rapids_singlecell/logging.py
Custom logger with HINT log level (between INFO and DEBUG), _RootLogger override capturing now() and computing time_passed, _LogFormatter that injects level-specific prefixes (INFO/HINT/DEBUG), handles {time_passed} substitution, appends deep metadata, and selects output stream (stdout in IPython, stderr otherwise). Exports five functions—error, warning, info, hint, debug—each returning a timestamp and accepting optional time, deep, and extra parameters.
Verbosity enum and settings configuration
src/rapids_singlecell/_settings/verbosity.py, src/rapids_singlecell/_settings/__init__.py
Verbosity IntEnum with five members (error through debug), a level property mapping to Python logging levels, and an override(...) context manager for temporary verbosity changes. Pydantic _RscSettings model coerces case-insensitive string verbosity names to enum values via BeforeValidator, validates post-construction to apply logging level by calling rapids_singlecell.logging._set_log_level, and exports a module-level settings instance.
Package-level API wiring
src/rapids_singlecell/__init__.py, src/rapids_singlecell/decoupler_gpu/_helper/_log.py
Top-level __init__.py imports and exports logging module, Verbosity, and settings via explicit __all__; removes prior cuml.internals.logger.set_level(2) side-effect. decoupler_gpu._log helper now routes output through rapids_singlecell.logging (logg.warning for "warn" level, logg.info otherwise).
Logging in preprocessing functions
src/rapids_singlecell/preprocessing/_*
Instrumented normalize_total, log1p, pca, hvg, neighbors, bbknn, regress_out, scale, harmony_integrate, scrublet/core, and filter_genes/cells/highly_variable with start-timestamp logging, finish-with-elapsed-time logging, optional debug messages (e.g., zero_center disabled, categorical regression), optional warnings (e.g., n_neighbors capped), and optional hint blocks listing output fields added to adata.
Logging in tools functions
src/rapids_singlecell/tools/_*
Instrumented leiden, louvain, rank_genes_groups, tsne, and umap with start-timestamp logging, finish-with-elapsed-time logging, debug messages for parameters/groups, optional warnings (e.g., UMAP neighbor-graph method validation), and hint/info blocks describing output keys and uns fields written.
Comprehensive logging and settings tests
tests/test_logging.py
Test suite with 11 test functions validating logging defaults and handler installation, log record emission and numeric level mappings, formatted output strings by level, hint "deep" message rendering, elapsed-time tracking (using monkeypatch to control timestamps), verbosity propagation to cuml logger, context-manager-based temporary verbosity overrides, string-to-enum coercion, validation errors on invalid names, and public re-exports. Two autouse fixtures attach caplog handler and reset verbosity state after each test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • scverse/rapids-singlecell#625: Both modify src/rapids_singlecell/preprocessing/_harmony_integrate.py — this PR adds logging, the other updates the Harmony2 function signature and parameter forwarding.

Suggested reviewers

  • Zethson
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.60% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add a logger' accurately and concisely describes the main change, which is the introduction of a Scanpy-style logging system to the rapids_singlecell package.
Description check ✅ Passed The description 'This adds a scanpy style logger to rsc' directly relates to the changeset and accurately characterizes the primary purpose of the pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add-a-logger

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/rapids_singlecell/logging.py`:
- Around line 104-116: The code currently uses truthiness checks on optional
formatter fields which skips valid falsy values (e.g., timedelta(0)); update the
conditionals to explicit None checks: replace "if record.time_passed:" with "if
record.time_passed is not None:" and keep the microseconds branch as an explicit
check (e.g., "if record.time_passed.microseconds != 0:" or similar) so
zero-duration is handled, and replace "if record.deep:" with "if record.deep is
not None:"; ensure the logic inside (replacing "{time_passed}" in record.msg,
adjusting self._style._fmt, and appending ": {deep}" behavior) remains unchanged
apart from these None checks.

In `@src/rapids_singlecell/preprocessing/_neighbors/__init__.py`:
- Around line 190-193: The auto-adjustment block that sets n_neighbors when
n_neighbors > adata.shape[0] needs an explicit guard for empty AnnData: check if
adata.shape[0] == 0 before adjusting and raise a clear ValueError (or log and
exit) instead of proceeding to set n_neighbors to 1; update the code around the
n_neighbors adjustment (the block that references adata.shape and n_neighbors in
src/rapids_singlecell/preprocessing/_neighbors/__init__.py) so empty input is
detected early and handled with an informative error message to avoid later KNN
failures.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ce41bad9-ff8d-4e9c-ac3c-2dbc9a4ddc78

📥 Commits

Reviewing files that changed from the base of the PR and between 55b331f and 1850beb.

📒 Files selected for processing (20)
  • src/rapids_singlecell/__init__.py
  • src/rapids_singlecell/_settings/__init__.py
  • src/rapids_singlecell/_settings/verbosity.py
  • src/rapids_singlecell/decoupler_gpu/_helper/_log.py
  • src/rapids_singlecell/logging.py
  • src/rapids_singlecell/preprocessing/_harmony_integrate.py
  • src/rapids_singlecell/preprocessing/_hvg/__init__.py
  • src/rapids_singlecell/preprocessing/_neighbors/__init__.py
  • src/rapids_singlecell/preprocessing/_normalize.py
  • src/rapids_singlecell/preprocessing/_pca.py
  • src/rapids_singlecell/preprocessing/_regress_out.py
  • src/rapids_singlecell/preprocessing/_scale.py
  • src/rapids_singlecell/preprocessing/_scrublet/__init__.py
  • src/rapids_singlecell/preprocessing/_scrublet/core.py
  • src/rapids_singlecell/preprocessing/_simple.py
  • src/rapids_singlecell/tools/_clustering.py
  • src/rapids_singlecell/tools/_rank_genes_groups/__init__.py
  • src/rapids_singlecell/tools/_tsne.py
  • src/rapids_singlecell/tools/_umap.py
  • tests/test_logging.py

Comment on lines +104 to +116
if record.time_passed:
if record.time_passed.microseconds:
record.time_passed = timedelta(
seconds=int(record.time_passed.total_seconds())
)
if "{time_passed}" in record.msg:
record.msg = record.msg.replace(
"{time_passed}", str(record.time_passed)
)
else:
self._style._fmt += " ({time_passed})"
if record.deep:
record.msg = f"{record.msg}: {record.deep}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use explicit None checks for optional formatter fields.

Please avoid truthiness checks on optional values here. This also fixes the zero-duration edge case (timedelta(0)) being skipped.

Suggested patch
-        if record.time_passed:
+        if record.time_passed is not None:
             if record.time_passed.microseconds:
                 record.time_passed = timedelta(
                     seconds=int(record.time_passed.total_seconds())
                 )
             if "{time_passed}" in record.msg:
                 record.msg = record.msg.replace(
                     "{time_passed}", str(record.time_passed)
                 )
             else:
                 self._style._fmt += " ({time_passed})"
-        if record.deep:
+        if record.deep is not None:
             record.msg = f"{record.msg}: {record.deep}"
As per coding guidelines, `**/*.py`: Use `is None` and `is not None` for optional parameters instead of truthiness checks.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if record.time_passed:
if record.time_passed.microseconds:
record.time_passed = timedelta(
seconds=int(record.time_passed.total_seconds())
)
if "{time_passed}" in record.msg:
record.msg = record.msg.replace(
"{time_passed}", str(record.time_passed)
)
else:
self._style._fmt += " ({time_passed})"
if record.deep:
record.msg = f"{record.msg}: {record.deep}"
if record.time_passed is not None:
if record.time_passed.microseconds:
record.time_passed = timedelta(
seconds=int(record.time_passed.total_seconds())
)
if "{time_passed}" in record.msg:
record.msg = record.msg.replace(
"{time_passed}", str(record.time_passed)
)
else:
self._style._fmt += " ({time_passed})"
if record.deep is not None:
record.msg = f"{record.msg}: {record.deep}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rapids_singlecell/logging.py` around lines 104 - 116, The code currently
uses truthiness checks on optional formatter fields which skips valid falsy
values (e.g., timedelta(0)); update the conditionals to explicit None checks:
replace "if record.time_passed:" with "if record.time_passed is not None:" and
keep the microseconds branch as an explicit check (e.g., "if
record.time_passed.microseconds != 0:" or similar) so zero-duration is handled,
and replace "if record.deep:" with "if record.deep is not None:"; ensure the
logic inside (replacing "{time_passed}" in record.msg, adjusting
self._style._fmt, and appending ": {deep}" behavior) remains unchanged apart
from these None checks.

Comment on lines +190 to +193
if n_neighbors > adata.shape[0]:
n_neighbors = 1 + int(0.5 * adata.shape[0])
logg.warning(f"n_obs too small: adjusting to `n_neighbors = {n_neighbors}`")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard empty AnnData before neighbor auto-adjustment.

When adata.shape[0] == 0, the new adjustment produces n_neighbors = 1, which is still invalid and fails later in KNN with a less direct error. Add an explicit early check for empty input.

💡 Proposed fix
+    if adata.shape[0] == 0:
+        raise ValueError("Cannot compute neighbors on empty AnnData (`n_obs == 0`).")
+
     if n_neighbors > adata.shape[0]:
         n_neighbors = 1 + int(0.5 * adata.shape[0])
         logg.warning(f"n_obs too small: adjusting to `n_neighbors = {n_neighbors}`")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if n_neighbors > adata.shape[0]:
n_neighbors = 1 + int(0.5 * adata.shape[0])
logg.warning(f"n_obs too small: adjusting to `n_neighbors = {n_neighbors}`")
if adata.shape[0] == 0:
raise ValueError("Cannot compute neighbors on empty AnnData (`n_obs == 0`).")
if n_neighbors > adata.shape[0]:
n_neighbors = 1 + int(0.5 * adata.shape[0])
logg.warning(f"n_obs too small: adjusting to `n_neighbors = {n_neighbors}`")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rapids_singlecell/preprocessing/_neighbors/__init__.py` around lines 190
- 193, The auto-adjustment block that sets n_neighbors when n_neighbors >
adata.shape[0] needs an explicit guard for empty AnnData: check if
adata.shape[0] == 0 before adjusting and raise a clear ValueError (or log and
exit) instead of proceeding to set n_neighbors to 1; update the code around the
n_neighbors adjustment (the block that references adata.shape and n_neighbors in
src/rapids_singlecell/preprocessing/_neighbors/__init__.py) so empty input is
detected early and handled with an informative error message to avoid later KNN
failures.

@codecov-commenter
Copy link
Copy Markdown

⚠️ JUnit XML file not found

The CLI was unable to find any JUnit XML files to upload.
For more help, visit our troubleshooting guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants