[6/6] Add C. elegans worked example; rewrite README around new knobs by alexanderbates · Pull Request #10 · DrugowitschLab/ConnectomeInfluenceCalculator

alexanderbates · 2026-05-02T13:14:33Z

Add C. elegans worked example; rewrite README around the new knobs

examples/celegans_worked_example.py is a self-contained end-to-end run
on the bundled C. elegans connectome. Per-seed influence from every
sensory neuron (83 cells -> 46 cell classes after collapsing bilateral
pairs) onto every non-sensory target (187 -> 136 classes), summed per
(target_class, seed_class), log-adjusted via adjust_influence with an
auto-calibrated const = -log(min_nonzero |sum|), and rendered as two
heatmaps grouped by anatomical body_part with average-linkage
clustering within each group. The script encodes its design choices as
constants at the top: LAMBDA_MAX=0.5 (the small graph needs the leading
mode damped or it dominates every column), COUNT_THRESH=0,
EXCLUDE_BODY_PARTS={'pharynx'}, plus C. elegans neurotransmitter
conventions (only ACh and GABA have unambiguous signs; glutamate /
dopamine / serotonin / octopamine are excluded).

The cell-class regex collapses trailing L/R suffixes only -- omitting
the seemingly more "complete" DL|DR|VL|VR alternatives because Python's
leftmost-first alternation would otherwise match DL at position 2 of
AVDL (with lookbehind V) and produce AV instead of AVD.

Two output heatmaps land in docs/images/ alongside four explanatory
images for the README: a source-to-targets schematic and an
adjusted_influence-vs-traversal-depth scatter pulled from the
natverse/influencer R sibling package, and a bespoke annotated linear
dynamical model + a 12s propagation-to-steady-state animation on a
28-node toy graph.

README is restructured around the parameters a user actually tunes:

Description: derives the rescaling W_tilde = (lambda / lambda_max(W))
W with the reverb-knob metaphor, names lambda_max as the constructor
argument, and gives species-specific guidance (~0.99 for whole-CNS
Drosophila BANC, ~0.5 for C. elegans). syn_weight_measure documented
as a deliberate 'count' default with the rationale that signed mode
pairs cleanly with raw counts but distorts 'norm' (columns no longer
sum to 1).
Usage: lists all six entry points (DataFrame init plus from_sql,
from_csv, from_parquet, from_feather, from_numpy), names the required
edge list and meta columns, and explicitly states that missing
columns raise an actionable ValueError.
adjust_influence section: explains the log-+-const transform and the
exp(-const) junk-node floor, gives the auto-calibration recipe for
non-BANC datasets, summarises the three output columns and when to
use which, and notes that calculate_influence runs adjust_influence
by default so the standalone import is only needed for advanced
multi-seed aggregation.
Worked example section: minimal end-to-end snippet, knobs table
cross-referencing the Description and adjust_influence sections, the
two heatmaps, and a short data-source attribution to the OpenWorm
project + White 1986 + Cook 2019.
BANC dataset section: alongside the Dataverse DOI, lists the Lee
Lab's public GCS bucket for the Feather edge list, with from_feather
consumption.
Citation section between the worked example and Contributing: cites
Bates, Phelps, Kim, Yang et al. (2025) (bioRxiv 2025.07.31.667571,
PMID 40766407) in prose and BibTeX, and points users at the Zenodo
DOI badge for the software citation.
One-line cross-link at the top of Description to natverse/influencer
(the R sibling package that wraps this library and provides a native
R backend).

Removes two pre-existing duplications in the Description (the
"this code computes the influence scores..." sentence appeared twice;
the trailing paragraph on creating a pandas dataframe and silencing
neurons was already covered by Usage).

…as kwargs Replaces the hardcoded NEG_NEUROTRANSMITTERS module constant with two explicit constructor arguments so that the library no longer pre-empts the user's neurotransmitter sign assignment: - inhibitory_nts: pre-neuron top_nt values to negate when signed=True (required when signed=True; raises ValueError otherwise). - excluded_nts: pre-neuron top_nt values to drop entirely from W, independent of signed=True/False. Useful for transmitter classes whose net sign at a given target depends on the receptor mix and so cannot be assigned a single sign safely. Adds lambda_max as a constructor argument (default 0.99 for backwards compatibility). _normalize_W now always rescales to lambda_max exactly rather than only capping when the natural eigenvalue exceeds it, so the parameter is a true control knob over leading-mode amplification rather than just a stability ceiling. The amplification of the leading mode in (I - W_rescaled)^-1 is 1 / (1 - lambda_max), so 0.99 gives ~100x and 0.5 gives ~2x. Surfaces syn_weight_measure ('count' or 'norm') as a constructor argument and changes the default from 'norm' to 'count'. Fixes a pre-existing bug in _create_sparse_W: the signed=True path negated the 'count' column unconditionally, but the matrix was populated from the column named by syn_weight_measure (default 'norm'), so the signed flag silently produced the same matrix as signed=False. The negation now applies to the column actually consumed. An inline comment notes that flipping signs on 'norm' breaks the column-sums-to-1 interpretation, so 'count' is the more natural choice in signed mode. Sign preservation: _build_influence_dataframe now keeps the real part of the steady-state vector in signed mode rather than always taking the magnitude, so net-inhibited targets carry a negative score. Validates lambda_max in (0, 1) and syn_weight_measure in {'count', 'norm'}. When signed=True or excluded_nts is set, the SQLite meta table must include a 'top_nt' column or _create_sparse_W raises.

calculate_influence now returns both the raw influence column and the three log-compressed adjusted_influence columns by default. Users can compare adjusted vs unadjusted scores from a single call rather than having to import adjust_influence and post-process the output themselves; opt out with adjust=False. The log compression is parameterised via two new kwargs on calculate_influence: adjust_const (the exp(-c) junk-node floor /+c shift, default 24) and adjust_signif (rounding, default 6). adjust_influence is added as a module-level function so advanced workflows can still post-process aggregated DataFrames (e.g. summing per-(target_class, seed_class) across multiple seeds before log compression in a worked example). Its output is three columns: - adjusted_influence = sign(x) * (log(max(|x|, exp(-const))) + const) - adjusted_influence_norm_by_targets (divides by n_targets per group) - adjusted_influence_norm_by_sources_and_targets (divides by n_sources * n_targets per group) The function dispatches on the presence of 'target' and 'seed' columns: when present it groups and sums per (target, seed); when absent it treats each row as its own group, which is the case for the DataFrame calculate_influence builds. Sign is preserved, so signed-mode input yields signed-mode output.

Replaces the legacy single-test scaffold (tests/test_InfluenceCalculator.py plus toy_network_example.sqlite and an example notebook) with a focused pytest suite that exercises the constructor surface introduced by the recent parameter rework and the integration of adjust_influence into calculate_influence: - Constructor validation: signed=True without inhibitory_nts raises; lambda_max outside (0, 1) raises (parametrised over five values); unknown syn_weight_measure raises. - Construction smoke: unsigned and signed builds, the 'norm' syn_weight_measure path, and excluded_nts dropping pre-neurons. Wrapped in pytest.importorskip so they skip cleanly on machines without PETSc/SLEPc rather than failing the suite. - adjust_influence helper: column shape, log-+-const anchor at the strongest influence, exp(-const) floor mapping zero raw influence to zero adjusted, sign preservation in signed mode, and the two validation errors (both score columns present, no score column). - calculate_influence integration: default returns adjusted columns alongside raw, adjust=False returns raw only, signed mode produces some net-negative downstream targets. - Bundled-data sanity check on column presence and row counts. Bundles the C. elegans hermaphrodite chemical connectome (300 cells, 3,539 edges, 20,672 synapses) under InfluenceCalculator/data/ as two CSVs plus an importable wrapper (celegans_edgelist(), celegans_meta()). Provenance and citation BibTeX live in the module docstring so help(InfluenceCalculator.data) surfaces them. The CSVs are an OpenWorm distribution extract (accessed February 2026) aggregating White et al. 1986 and Cook et al. 2019 with WormAtlas / CenGen annotations. The conftest fixture builds a temporary SQLite database from the bundled CSVs to drive InfluenceCalculator's still-SQLite-only constructor. Once the DataFrame / from_csv constructors land in a later PR the fixture can collapse to a path handoff.

pyproject.toml: - Bump setuptools requirement to >=77 so the SPDX-string license syntax (license = "BSD-3-Clause") from PEP 639 is accepted; recent setuptools warns against the dual-purpose license field used before. - Bump requires-python to >=3.10 (matches the language features used internally and the lower bound of pandas / petsc4py wheels). - Bump version to 0.2.0 to reflect the externalised neurotransmitter parameters, lambda_max, syn_weight_measure, sign-preserving signed mode, and adjust_influence integration. - Refresh the description and project URLs (homepage, repository, issues, documentation now declared explicitly). - Declare optional dependency extras (parquet, examples, test, dev) so a CI image can install just what it needs (pip install .[test]) rather than dragging the worked-example matplotlib stack into a test-only build. - Add [tool.setuptools.package-data] so the bundled InfluenceCalculator/data/*.csv files ship in the wheel. - Add [tool.pytest.ini_options] testpaths so pytest discovers the suite without an explicit positional argument. .gitignore: add the obvious development-time noise (__pycache__/, .pytest_cache/, .venv/) and the Influence/ directory the legacy test script wrote per-seed CSVs into.

The internal data structure built by InfluenceCalculator is a sparse PETSc matrix populated from an edge list, so the input format closest to that representation is a pandas DataFrame edge list (plus optional metadata DataFrame). __init__ now takes those directly: InfluenceCalculator(edgelist_df, meta_df=None, signed=False, ...) Five classmethod loaders adapt other input formats to the DataFrame __init__. Each one reads the format, then forwards every other kwarg through **kwargs so the loaders do not have to repeat the constructor signature: - from_sql(filename, **kwargs) -- SQLite (meta + edgelist_simple) - from_csv(edgelist_path, meta_path, ...) - from_parquet(edgelist_path, meta_path, ...) (requires pyarrow / fastparquet) - from_feather(edgelist_path, meta_path, ...) (requires pyarrow) - from_numpy(adjacency_matrix, neuron_ids=None, meta_df=None, **kwargs) This is a breaking change for callers using the previous SQLite-only __init__. Update those call sites to InfluenceCalculator.from_sql. from_numpy converts non-zero entries of the adjacency matrix into a synthesised edge list with a 'count' column and forwards through __init__. Because count_thresh is applied to that synthesised column, callers passing pre-normalised float weights should set count_thresh=0. Three module-level validation helpers -- _validate_meta, _validate_and_prepare_edgelist, and _check_parquet_available / _check_feather_available -- enforce the column requirements with descriptive ValueErrors that name the missing column and list the columns the caller actually passed. When 'norm' is absent it is computed from 'count' as count / sum(count) per post; when 'weight' is present (instead of 'count') it is treated as a pre-normalised input and count_thresh is bypassed. The redundant inline top_nt checks inside _create_sparse_W are removed since validation runs upfront. Tests are updated to drive the DataFrame __init__ directly and add format-equivalence checks (from_sql vs DataFrame, from_csv vs DataFrame, from_numpy smoke). conftest.py exposes the bundled CSVs four ways: as DataFrames, as filesystem paths via importlib.resources.as_file(), and as a session-scoped temporary SQLite database for from_sql.

examples/celegans_worked_example.py is a self-contained end-to-end run on the bundled C. elegans connectome. Per-seed influence from every sensory neuron (83 cells -> 46 cell classes after collapsing bilateral pairs) onto every non-sensory target (187 -> 136 classes), summed per (target_class, seed_class), log-adjusted via adjust_influence with an auto-calibrated const = -log(min_nonzero |sum|), and rendered as two heatmaps grouped by anatomical body_part with average-linkage clustering within each group. The script encodes its design choices as constants at the top: LAMBDA_MAX=0.5 (the small graph needs the leading mode damped or it dominates every column), COUNT_THRESH=0, EXCLUDE_BODY_PARTS={'pharynx'}, plus C. elegans neurotransmitter conventions (only ACh and GABA have unambiguous signs; glutamate / dopamine / serotonin / octopamine are excluded). The cell-class regex collapses trailing L/R suffixes only -- omitting the seemingly more "complete" DL|DR|VL|VR alternatives because Python's leftmost-first alternation would otherwise match DL at position 2 of AVDL (with lookbehind V) and produce AV instead of AVD. Two output heatmaps land in docs/images/ alongside four explanatory images for the README: a source-to-targets schematic and an adjusted_influence-vs-traversal-depth scatter pulled from the natverse/influencer R sibling package, and a bespoke annotated linear dynamical model + a 12s propagation-to-steady-state animation on a 28-node toy graph. README is restructured around the parameters a user actually tunes: - Description: derives the rescaling W_tilde = (lambda / lambda_max(W)) W with the reverb-knob metaphor, names lambda_max as the constructor argument, and gives species-specific guidance (~0.99 for whole-CNS Drosophila BANC, ~0.5 for C. elegans). syn_weight_measure documented as a deliberate 'count' default with the rationale that signed mode pairs cleanly with raw counts but distorts 'norm' (columns no longer sum to 1). - Usage: lists all six entry points (DataFrame __init__ plus from_sql, from_csv, from_parquet, from_feather, from_numpy), names the required edge list and meta columns, and explicitly states that missing columns raise an actionable ValueError. - adjust_influence section: explains the log-+-const transform and the exp(-const) junk-node floor, gives the auto-calibration recipe for non-BANC datasets, summarises the three output columns and when to use which, and notes that calculate_influence runs adjust_influence by default so the standalone import is only needed for advanced multi-seed aggregation. - Worked example section: minimal end-to-end snippet, knobs table cross-referencing the Description and adjust_influence sections, the two heatmaps, and a short data-source attribution to the OpenWorm project + White 1986 + Cook 2019. - BANC dataset section: alongside the Dataverse DOI, lists the Lee Lab's public GCS bucket for the Feather edge list, with from_feather consumption. - Citation section between the worked example and Contributing: cites Bates, Phelps, Kim, Yang et al. (2025) (bioRxiv 2025.07.31.667571, PMID 40766407) in prose and BibTeX, and points users at the Zenodo DOI badge for the software citation. - One-line cross-link at the top of Description to natverse/influencer (the R sibling package that wraps this library and provides a native R backend). Removes two pre-existing duplications in the Description (the "this code computes the influence scores..." sentence appeared twice; the trailing paragraph on creating a pandas dataframe and silencing neurons was already covered by Usage).

alexanderbates added 6 commits May 1, 2026 21:17

alexanderbates mentioned this pull request May 2, 2026

Updates to increase the flexibility of the influence calculator #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[6/6] Add C. elegans worked example; rewrite README around new knobs#10

[6/6] Add C. elegans worked example; rewrite README around new knobs#10
alexanderbates wants to merge 6 commits into
DrugowitschLab:mainfrom
alexanderbates:pr6-celegans-example-and-readme

alexanderbates commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexanderbates commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant