Skip to content

Clarify SIPP is public-use; only IRS-PUF is access-restricted#809

Open
MaxGhenis wants to merge 1 commit intomainfrom
correct-sipp-licensing-language
Open

Clarify SIPP is public-use; only IRS-PUF is access-restricted#809
MaxGhenis wants to merge 1 commit intomainfrom
correct-sipp-licensing-language

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • Adds a `Licensing` section to `policyengine_us_data/datasets/sipp/README.md` clarifying that SIPP public-use data has no per-user license, agreement, or registration requirement.
  • Of the six upstream microdata sources the Enhanced CPS pipeline ingests (CPS, ACS, SCF, ORG, SIPP, IRS-PUF), only IRS-PUF has a genuine access restriction.
  • Makes it explicit that our HuggingFace mirror of `pu2023.csv` is a caching convenience, not an access-restriction workaround.

Why

At the 2026-04-21 meeting with Lars Vilhuber (AEA Data Editor), John Sabelhaus, and the TRACE team, John (who co-built the Sabelhaus subsynthetic beta) corrected a claim Max had been making: that SIPP requires individual user licensing. The actual SIPP vintage we use is Census public-use SIPP; the subsynthetic beta is separate. Both are effectively unrestricted. Overstating restrictions matters because it distorts which pipeline inputs genuinely warrant institutional-certification framing under TRACE.

No pipeline or ingest code changes; this is purely a docs guardrail to prevent regressions in external-communication writeups.

Test plan

  • `grep` for SIPP / licensing claims across the repo — no existing misclaims to fix
  • Render the README to confirm the new Licensing section formats correctly

Fixes #808.

🤖 Generated with Claude Code

John Sabelhaus corrected a licensing overclaim in the 2026-04-21
meeting: the SIPP vintage we consume (Census public-use SIPP) has no
per-user license, data-use agreement, or registration requirement. Of
the six upstream sources the pipeline ingests (CPS, ACS, SCF, ORG,
SIPP, IRS-PUF), only IRS-PUF has a genuine access restriction. The
HuggingFace mirror of pu2023.csv is a caching convenience, not an
access-restriction workaround.

This matters for TRACE / reproducibility writeups: overstating which
inputs are restricted distorts the institutional-certification story.

Fixes #808.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Correct SIPP licensing language; only IRS-PUF is genuinely restricted

1 participant