Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 176 additions & 5 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,14 +91,38 @@ Fix spelling, grammar, and other minor problems without asking the user. Label a

Only report what you have changed.

## Programming rules for LLMs
1. mostly lower-case comments
2. no in-line comments
3. don't number comments
4. comments are followed by '----', such as "# dependencies----". nothing that takes a full line.
5. no unnecessary code changes beyond what is already done
6. unless I ask, don't print too much code at once
7. don't do unnecessary print statements within the code.
8. don't add unnecessary try/catch statements or make unnecessary validation checks. i'm working by myself, no need for these things -- I want to see the errors!
9. NO EMOJIS
10. in R code, 2 spaces per tab and base R pipe

## Refactor rules
We're attemping to refactor many issues with this repositroy. In general, let's:
- make sure all examples are functional and not wrapped in dontrun{}

### Per-function checklist

When refactoring any estimator or internal function, apply all of these:

1. Make runnable examples (no `\dontrun{}`)
2. Remove commented-out code
3. Remove TODOs
4. Add type checks (checkmate)
5. Factor out tidyverse (`filter` → `df[cond, ]`, `mutate` → direct assignment)
6. `<-` for assignment (not `=`)
7. Drop debug prints (`cat`, `print`, commented `# print(...)`)
8. Consistent variable naming without periods (e.g. `pi_S` not `pi.S`)
9. Drop backtick column access (`temp$\`piA\`` → `temp$piA`)

### General rules

- make sure we have test cases for all functions
- remove @TODO blocks and other ugly code
- remove magrittr pipe and replace with base R pipe
- remove tidyverse dependencies
- replace "=" with "<-"
- make sure every function is documented
- add validation to function inputs

Expand All @@ -109,3 +133,150 @@ Run this one-liner to validate the package before committing:
```
Rscript -e "devtools::document()" && Rscript -e "styler::style_pkg()" && Rscript -e "spelling::spell_check_package()" && Rscript -e "lintr::lint_package()" && Rscript -e "devtools::check(vignettes = FALSE)"
```

## Changing the API

### Goals

1. **Better method constructors** — replace `setup_method_weighting(method_name="IPW", ...)` with `ec_ipw()`, etc. Each constructor carries its own estimation logic.
2. **Polymorphic dispatch** — `run_analysis()` calls a generic on the method object instead of an if/else tree. Adding a new method = writing one constructor.
3. **Merge bootstrap** — bootstrap is an inference option on the method, not a separate code path.
4. **(Future) Model formula interface** — `outcome ~ treatment | covariates` instead of column name args.

### Current workflow (to deprecate)

```r
method <- setup_method_weighting(
method_name = "IPW",
optimal_weight_flag = FALSE,
wt = 0,
model_form_piS = "S ~ x1 + x2 + x3 + x4 + x5"
)

analysis <- setup_analysis_primary(
data = SyntheticData,
trial_status_col_name = "S",
treatment_col_name = "A",
outcome_col_name = c("y1", "y2"),
covariates_col_name = c("x1", "x2", "x3", "x4", "x5"),
method_weighting_obj = method
)

res <- run_analysis(analysis)
```

### Desired workflow

```r
method <- ec_ipw(
ps_formula = "S ~ x1 + x2 + x3 + x4 + x5",
weight = NULL, # NULL = optimal, 0 = no borrowing, 0.3 = fixed
bootstrap = 500, # NULL = sandwich SE only
bootstrap_ci_type = NULL # NULL defaults to "perc" when bootstrap is set
)

analysis <- setup_analysis(
data = SyntheticData,
outcomes = c("y1", "y2"),
treatment = "A",
trial_status = "S",
covariates = c("x1", "x2", "x3", "x4", "x5"),
method = method
)

res <- run_analysis(analysis)
```

### Method constructors

| Constructor | Replaces | Phase |
|---|---|---|
| `ec_ipw()` | `setup_method_weighting(method_name="IPW", ...)` | Primary |
| `ec_aipw()` | `setup_method_weighting(method_name="AIPW", ...)` | Primary |
| `did_ec_ipw()` | `setup_method_DID(method_name="IPW", ...)` | OLE |
| `did_ec_aipw()` | `setup_method_DID(method_name="AIPW", ...)` | OLE |
| `did_ec_or()` | `setup_method_DID(method_name="OR", ...)` | OLE |
| `scm()` | `setup_method_SCM(...)` | OLE |

Each constructor returns an S4 method object. The S4 class defines a generic `estimate()` that `run_analysis()` dispatches on — no if/else.

### How dispatch works

```r
# S4 generic
setGeneric("estimate", function(method, data, ...) standardGeneric("estimate"))

# Each method class implements estimate()
setMethod("estimate", "ec_ipw_method", function(method, data, ...) {
# IPW estimation logic lives here
})

# run_analysis() becomes:
run_analysis <- function(analysis_obj) {
estimate(analysis_obj@method, data = analysis_obj@data, ...)
}
```

### Interim dispatch (during migration)

While new method constructors coexist with the old if/else tree in `run_analysis()`, each new class gets an `else if` block at the end of `run_analysis()` that calls `estimate()`:

```r
# In run_analysis.R, BEFORE the final } else { stop(...) }:
} else if (is(method, "ec_ipw_method")) {
res <- estimate(method,
data = data,
outcomes = outcome_col_name,
treatment = treatment_col_name,
trial_status = trial_status_col_name,
covariates = covariates_col_name,
alpha = alpha,
quiet = quiet
)
}
```

This pattern is repeated for each new method class as it's created. The old if/else branches for the legacy classes remain untouched. Once all 6 methods are migrated, the entire if/else tree is replaced with a single `estimate()` call.

New method objects inherit from `method_primary_obj` or `method_OLE_obj`, so they pass the existing `checkmate::assert_class(method, "method_primary_obj")` validation in `setup_analysis_primary()`.

### Implementation order

1. Create full-pipeline regression tests for all 6 methods (old API, locked numerical values) ✓
2. Create all 6 method constructors (start with `ec_ipw()`)
3. Each constructor returns an S4 object with estimation logic via `estimate()` generic
4. For each new method, add an `else if (is(method, "xxx_method"))` to `run_analysis()`
5. Add new-API tests to each pipeline test file (same expected values)
6. Once all 6 are done: refactor `setup_analysis()` into a single function (merge `_primary`/`_OLE`, add `T_cross = NULL`)
7. Once all 6 are done: replace the entire if/else in `run_analysis()` with one `estimate()` call
8. Deprecate `setup_method_weighting`, `setup_method_DID`, `setup_method_SCM`, `setup_analysis_primary`, `setup_analysis_OLE`

### Design decisions

- **S4 classes for method objects** — keeps rigorous type definitions, consistent with existing package patterns.
- **`T_cross` goes in `setup_analysis()`** — it's a property of the study design, not the method.
- **`bootstrap_ci_type` is nullable** — defaults to `"perc"` when `bootstrap` is non-NULL, ignored otherwise.

### Full-pipeline regression tests

Before refactoring any estimator, we lock in its numerical outputs on `SyntheticData` so any code change that alters results is caught. Tests use the old API (setup_method → setup_analysis → run_analysis) with exact values at `tolerance = 1e-6`. As new API constructors are added, we add parallel assertions against the same expected values.

Rename existing `test-vignette_results_*` files and split by method:

| Test file | Method | Covers |
|---|---|---|
| `test-full_pipeline_ec_ipw.R` | EC-IPW | weight=0, optimal, fixed (0.3), bootstrap point estimates |
| `test-full_pipeline_ec_aipw.R` | EC-AIPW | weight=0, optimal, fixed (0.3), bootstrap point estimates |
| `test-full_pipeline_did_ec_ipw.R` | DID-EC-IPW | bootstrap CIs, point estimates |
| `test-full_pipeline_did_ec_aipw.R` | DID-EC-AIPW | bootstrap CIs, point estimates |
| `test-full_pipeline_did_ec_or.R` | DID-EC-OR | bootstrap CIs, point estimates |
| `test-full_pipeline_scm.R` | SCM | bootstrap CIs, point estimates |

Each test file asserts:
- Point estimates (exact values)
- Standard errors / standard deviations (exact values)
- CI bounds for non-bootstrap (exact values)
- Bootstrap point estimates match non-bootstrap
- Borrow weight (where applicable)

Simulation tests (`test-vignette_results_*_simulation.R`) stay separate — they test Monte Carlo properties, not individual estimator outputs.
4 changes: 2 additions & 2 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,5 +47,5 @@ jobs:
- uses: r-lib/actions/check-r-package@v2
with:
upload-snapshots: true
build_args: '"--no-manual"'
args: 'c("--no-manual", "--as-cran")'
build_args: 'c("--no-manual", "--no-build-vignettes")'
args: 'c("--no-manual", "--ignore-vignettes", "--as-cran")'
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,8 @@ Collate:
'method_weighting_class.R'
'analysis_primary_class.R'
'data.R'
'ec_ipw.R'
'ec_aipw.R'
'package.R'
'rdborrow-package.R'
'run_analysis.R'
Expand Down
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Generated by roxygen2: do not edit by hand

export(ec_aipw)
export(ec_ipw)
export(estimate)
export(run_analysis)
export(run_simulation)
export(setup_analysis_OLE)
Expand Down
File renamed without changes.
Loading
Loading