Skip to content

Replace publisher URLs with doi.org equivalents (313)#759

Merged
richarddushime merged 3 commits intomainfrom
convert-publisher-urls-to-doi
May 7, 2026
Merged

Replace publisher URLs with doi.org equivalents (313)#759
richarddushime merged 3 commits intomainfrom
convert-publisher-urls-to-doi

Conversation

@LukasWallrich
Copy link
Copy Markdown
Contributor

@LukasWallrich LukasWallrich commented May 1, 2026

Summary

Many references in `content/` pointed at publisher pages (SAGE, ScienceDirect, Wiley, Springer, Royal Society, OUP, JSTOR, psycnet, etc.) instead of `https://doi.org/{DOI}\`. Publishers bot-block automated checks, so those links were producing unhelpful "broken" reports in the link checker even though the underlying papers are fine.

This PR converts 313 publisher URLs to their `doi.org` form, in two passes. Every replacement DOI was verified against the DOI Handle API before being applied — nothing was changed without confirming the new link resolves.

Pass 1 — DOI visible in URL (207)

Regex-extract from the URL itself, then verify.

count source pattern
161 `sciencedirect.com/science/article/[abs/]?pii/` (PII → DOI via Crossref `alternative-id` filter)
33 `psycnet.apa.org/doiLanding?doi=...` (DOI lifted from query string)
6 `royalsocietypublishing.org/doi/...`
3 `link.springer.com/article|chapter|...`
2 `journals.sagepub.com/doi/...`
1 `onlinelibrary.wiley.com/doi/...`
1 `tandfonline.com/doi/...`

Pass 2 — DOI looked up from URL ID (106)

Extract the publisher's record/article/stable ID, look it up via Crossref's `alternative-id` filter, fall back to OpenAlex bibliographic search for JSTOR/OUP. Verify via Handle API.

count source pattern
57 `psycnet.apa.org/{record,buy}/` (Crossref alt-id)
22 `sciencedirect.com/.../pii/` — prior rate-limit misses retried
14 `academic.oup.com//article-abstract/.../` (mostly OpenAlex)
10 `jstor.org/stable/` (OpenAlex)
3 `psycnet.apa.org/fulltext/.html`

Not addressed (~72 URLs left)

  • ~27 psycnet record/fulltext that neither Crossref nor OpenAlex indexes (likely book chapters, dissertations, or non-journal records).
  • ~23 ScienceDirect PIIs that neither service recognises (conference proceedings, supplements).
  • 7 JSTOR stable IDs that don't map to `10.2307/` (older content predating DOI deposit).
  • 7 `journals.lww.com/...` (lww IDs aren't indexed in Crossref or OpenAlex).
  • A handful of others.

These need per-URL human verification — likely "the original article never had a DOI deposited" rather than a fixable typo.

Test plan

  • Spot-check a couple of converted DOIs in the rendered site (do they go to the right paper?)
  • Re-run the Link Checker workflow after merging both this PR and Link Checker: rolling issue + Crossref DOI validation #757; the "Publisher URLs that should use doi.org" section should be ~313 entries shorter.

🤖 Generated with Claude Code

Many references in content/ pointed directly at publisher pages
(SAGE, ScienceDirect, Wiley, Springer, Royal Society, etc.) instead
of using the doi.org form. Publishers bot-block automated link
checks, which produced unhelpful "broken" reports for valid links.

This pass converts 207 of those URLs to https://doi.org/{DOI}, after
verifying every replacement DOI resolves via the DOI Handle API:

  161  sciencedirect PII (PII -> DOI via Crossref alternative-id)
   33  psycnet doiLanding?doi=... (DOI is in the URL query string)
    6  royalsocietypublishing.org/doi/...
    3  link.springer.com/article|chapter|...
    2  journals.sagepub.com/doi/...
    1  onlinelibrary.wiley.com/doi/...
    1  tandfonline.com/doi/...

Files changed: content/OS-developing-world, content/adopting,
content/educators-corner/004-Teaching-why-how-replication,
content/neurodiversity-lessonbank/masterstools, content/reversals.

Not changed in this pass (~178 URLs remain): 45 ScienceDirect PIIs
that Crossref didn't have indexed (likely conference papers /
supplements), 87 psycnet /record/, /buy/, /fulltext/ URLs, 17
JSTOR /stable/ URLs (only 5 map cleanly to 10.2307/<id>), 15
academic.oup.com /article-abstract/, 7 journals.lww.com, plus a
handful of other patterns. Each needs a per-URL lookup rather
than a regex extraction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LukasWallrich LukasWallrich requested a review from a team as a code owner May 1, 2026 16:28
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

👍 All image files/references (if any) are in webp format, in line with our policy.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

📝 Spell Check Results

Found 1 potential spelling issue(s) when checking 10 changed file(s):

📄 content/educators-corner/004-Teaching-why-how-replication/index.md

Line Issue
94 pre-selected ==> preselected

ℹ️ How to address these issues:

  1. Fix the typo: If it's a genuine typo, please correct it.
  2. Add to whitelist: If it's a valid word (e.g., a name, technical term), add it to .codespell-ignore.txt
  3. False positive: If this is a false positive, please report it in the PR comments.

🤖 This check was performed by codespell

@LukasWallrich
Copy link
Copy Markdown
Contributor Author

LukasWallrich commented May 1, 2026

Staging Deployment Status

This PR has been successfully deployed to staging as part of an aggregated deployment.

Deployed at: 2026-05-07 19:13:54 UTC
Staging URL: https://staging.forrt.org

The staging site shows the combined state of all compatible open PRs.

Phase 2 of the publisher → doi.org conversion. Earlier pass relied
purely on regex extraction (DOI visible in URL) plus Crossref's
alternative-id filter for ScienceDirect PIIs. This pass adds:

- Crossref alternative-id lookup for psycnet record/buy/fulltext IDs
  (PsycInfo IDs are deposited as alternative identifiers for many
  journal articles).
- Crossref alternative-id lookup for OUP article numbers.
- OpenAlex bibliographic search as a fallback for JSTOR stable IDs
  and OUP article IDs that Crossref didn't index.
- Re-runs of the ScienceDirect PIIs that hit transient rate-limits
  in the first pass.

Resolved (all 90 unique DOIs Handle-API verified):
   57  psycnet record/buy   (Crossref alt-id)
   22  ScienceDirect PII    (Crossref alt-id, prior misses retried)
   14  OUP article          (mostly OpenAlex)
   10  JSTOR stable         (OpenAlex)
    3  psycnet fulltext     (Crossref alt-id)

72 publisher URLs still remain in content/ — mostly psycnet
records and ScienceDirect PIIs that neither Crossref nor OpenAlex
recognises, plus 7 lww URLs (their ID format isn't indexed by either).
Those are likely conference papers, supplements, or content the
publisher never deposited. They'd need per-URL human verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LukasWallrich LukasWallrich changed the title Replace publisher URLs with doi.org equivalents (207) Replace publisher URLs with doi.org equivalents (313) May 1, 2026
One psycnet doiLanding URL had a #:~:text= scroll-anchor appended,
which my Phase 1 regex didn't strip. Tighten the stop set to include
# and resolve the underlying DOI cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@richarddushime richarddushime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@richarddushime richarddushime merged commit ebf6993 into main May 7, 2026
5 checks passed
@richarddushime richarddushime deleted the convert-publisher-urls-to-doi branch May 7, 2026 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants