Skip to content

Add metrics parameter to data_tabulate()#689

Open
elinw wants to merge 24 commits into
easystats:mainfrom
elinw:measures
Open

Add metrics parameter to data_tabulate()#689
elinw wants to merge 24 commits into
easystats:mainfrom
elinw:measures

Conversation

@elinw

@elinw elinw commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

There were a few ways to approach this. The cumulative values depends on having the valid values, and it could make sense to just set any not included measures to NULL at the end, the way I did for valid.

I added a test for measures = "raw" ... I could add for other variations also.

Fixes #685

@elinw

elinw commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

#685

Comment thread R/data_tabulate.R Outdated
Comment thread R/data_tabulate.R Outdated
Comment thread tests/testthat/test-data_tabulate.R
Comment thread R/data_tabulate.R Outdated

This comment was marked as duplicate.

@etiennebacher etiennebacher left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of the other comments to address above, please add a bullet point in NEWS.md and add an example for the new argument in the docs of data_tabulate().

(@strengejacke I hid all the copilot comments because they were just duplicates of your comments and adding noise to this PR)

@elinw

elinw commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

I was wondering if it should be documented that measures = c() will return the frequencies only.
Also, that made me wonder if "N" should also be optional.

@elinw elinw requested a review from etiennebacher June 12, 2026 19:07
@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.11%. Comparing base (89daeba) to head (258bd93).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #689      +/-   ##
==========================================
+ Coverage   92.09%   92.11%   +0.02%     
==========================================
  Files          76       76              
  Lines        7727     7766      +39     
==========================================
+ Hits         7116     7154      +38     
- Misses        611      612       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@etiennebacher etiennebacher left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if it should be documented that measures = c() will return the frequencies only.

Yes I think so.

Also, that made me wonder if "N" should also be optional.

I don't know about that but if it's not included then I think this measures parameter should be renamed shares. @strengejacke WDYT?


The workflow for code styling fails. We use Air to automatically format code so you can install it and run it on the project to fix this failure.

Comment thread R/data_tabulate.R
Comment thread R/data_tabulate.R Outdated
Comment on lines +44 to +45
#' `NULL`. Can be `"raw"` (includes `NA` values), `"valid"` (excludes `NA` values)
#' or `"cumulative"` (excludes `NA` values).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' `NULL`. Can be `"raw"` (includes `NA` values), `"valid"` (excludes `NA` values)
#' or `"cumulative"` (excludes `NA` values).
#' `NULL`. Can be several choices among `"raw"` (includes `NA` values),
#' `"valid"` (excludes `NA` values) and `"cumulative"` (excludes `NA` values).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the wording but not exactly to yours. See what you think.

Comment thread tests/testthat/test-data_tabulate.R Outdated
Comment thread NEWS.md Outdated
Comment on lines +23 to +24
* `data_tabulate()` gain a `measures` argument to allow selection of columns to
display ("raw", "valid", and "cumulative") (#689).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add your github username if you want: (#689, <username>)

@etiennebacher etiennebacher changed the title Add measures parameter to control columns displayed in frequency tables. Add measures parameter to data_tabulate() Jun 14, 2026
@elinw

elinw commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

I was wondering if it should be documented that measures = c() will return the frequencies only.

Yes I think so.

Also, that made me wonder if "N" should also be optional.

I don't know about that but if it's not included then I think this measures parameter should be renamed shares. @strengejacke WDYT?

The workflow for code styling fails. We use Air to automatically format code so you can install it and run it on the project to fix this failure.

"shares" is not meaningful to me. Maybe "percentages" would be more descriptive?

@elinw elinw requested a review from etiennebacher June 15, 2026 21:44
@strengejacke

Copy link
Copy Markdown
Member

If N is included, percentages would not fit perfectly. I would say metrics (if N should be included) or percentages is fine.

@elinw

elinw commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

If N is included, percentages would not fit perfectly. I would say metrics (if N should be included) or percentages is fine.

This sounds good. Can you let me know if N definitely should be included?
I'll do a commit for the name change alone first.

@elinw

elinw commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

If N is included, percentages would not fit perfectly. I would say metrics (if N should be included) or percentages is fine.

This sounds good. Can you let me know if N definitely should be included? I'll do a commit for the name change alone first.

I added the "N" (in uppercase) option. I did have to change one additional function that assumed that the N column would be present. All of the tests still pass.

On the tests, the linter does not like testing for metrics = c(). I do think people will write that exactly so it's good to check, but identical(c(), NULL) is TRUE so technically it's a redundant test. Let me know what you think.

@elinw elinw changed the title Add measures parameter to data_tabulate() Add metrics parameter to data_tabulate() Jun 18, 2026
@elinw

elinw commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

I think everything is resolved unless you have new requests.

Comment thread R/data_tabulate.R
Comment on lines +42 to +47
#' @param metrics Optional character vector, indicating the types of
#' percents to be included. Only applies to frequencies, i.e. when `by` is
#' `NULL`. Can include any combination of `N` (frequencies including NA), `"raw"` (includes `NA` values),
#' `"valid"` (excludes `NA` values) or `"cumulative"` (excludes `NA` values).
#' Using `c()` will return a table with only the value labels. Invalid
#' values (`metrics = "foo"`) are silently ignored.

@etiennebacher etiennebacher Jun 19, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @param metrics Optional character vector, indicating the types of
#' percents to be included. Only applies to frequencies, i.e. when `by` is
#' `NULL`. Can include any combination of `N` (frequencies including NA), `"raw"` (includes `NA` values),
#' `"valid"` (excludes `NA` values) or `"cumulative"` (excludes `NA` values).
#' Using `c()` will return a table with only the value labels. Invalid
#' values (`metrics = "foo"`) are silently ignored.
#' @param metrics Optional character vector, indicating the types of
#' metrics to be included. Only applies to frequencies, i.e. when `by` is
#' `NULL`. Can include any combination of `N` (frequencies including `NA`),
#' `"raw"` (percentage including `NA` values), `"valid"` (percentage excluding
#' `NA` values) or `"cumulative"` (percentage excluding `NA` values).

The part about silently ignoring unknown values seemed wrong (it errors) so I removed it.

Comment thread R/data_tabulate.R Outdated
Comment thread R/data_tabulate.R
}
out <- cbind(var_info, out)
}
total_n <- sum(out$N, na.rm = TRUE)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this line moved? I don't think it causes an issue in itself but I'd rather minimize the git diff.

Comment thread NEWS.md Outdated
@etiennebacher

Copy link
Copy Markdown
Member

Can you also resolve the conflicts?

@elinw

elinw commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

I tried running Air on the file but the only issue it identified was duplicate blank lines. Not sure why this is failing.

@elinw elinw requested a review from etiennebacher June 21, 2026 19:01
@etiennebacher

etiennebacher commented Jun 22, 2026

Copy link
Copy Markdown
Member

The problem was that a new version of Air was released a couple of days ago and replaces = by <- by default (when possible). This was clashing with our custom syntax that can be used in data_filter().

There are just the few comments above to resolve now (don't worry about linting failures).

@elinw

elinw commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

The problem was that a new version of Air was released a couple of days ago and replaces = by <- by default (when possible). This was clashing with our custom syntax that can be used in data_filter().

There are just the few comments above to resolve now (don't worry about linting failures).

How about having a separate branch for those, since the comments are in unrelated files?

@etiennebacher

Copy link
Copy Markdown
Member

Yes I have actually merged a PR that fixes those Air failures so that these changes don't appear here. Your latest commit message says you left questions on some comments but I don't see anything, am I missing something? I just want to avoid a situation where we're both waiting for something from the other.

@elinw

elinw commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Yes I have actually merged a PR that fixes those Air failures so that these changes don't appear here. Your latest commit message says you left questions on some comments but I don't see anything, am I missing something? I just want to avoid a situation where we're both waiting for something from the other.

I think, in the end, the only question I had was about metrics = "foo" and that doesn't matter.

@etiennebacher

Copy link
Copy Markdown
Member

There are still unresolved comments above about docs/NEWS/line changed. I have made "suggestions" using Github interface but didn't actually apply them.

@elinw

elinw commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

I'm not sure why two builds are failing, but they don't seem related. R crashed during the test of test-standardize_models.R: but it might be that retriggering the builds would fix. Not sure what would have changed.
R 4.6.1 was released today so I guess it couldbe related to that.

Update:

── Warning (test-standardize-data.R:23:1): standardize.numeric ─────────────────
Global state has changed:
names(before[[4]]) | names(after[[4]])
[79] "TESTTHAT" | "TESTTHAT" [79]
[80] "TESTTHAT_WD" | "TESTTHAT_WD" [80]
[81] "TMPDIR" | "TMPDIR" [81]
- "TZDIR" [82]
[82] "USER" | "USER" [83]
[83] "XPC_FLAGS" | "XPC_FLAGS" [84]
[84] "XPC_SERVICE_NAME" | "XPC_SERVICE_NAME" [85]
before[[4]][79:84] vs after[[4]][79:85]
TERM"xterm-256color"
TESTTHAT"true"
TESTTHAT_WD"/Users/elinwaring/Code/rcode/datawizard"
TMPDIR"/var/folders/gw/_qnrkpdn3fjg_0gwwnb9h_g40000gn/T/"

  • TZDIR"/var/db/timezone/zoneinfo"
    USER"elinwaring"
    XPC_FLAGS"0x0"
    XPC_SERVICE_NAME"application.com.rstudio.desktop.371427447.371427452"
    [ FAIL 0 | WARN 1 | SKIP 0 | PASS 75 ]

update 2

https://discourse.mc-stan.org/t/global-state-has-changed-warning/36591

Was this resolved for report?

update 3

Apparently the resolved this by deleting tests/testthat/helper-state.R

easystats/report#471

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cumulative distribution should be optional

4 participants