Replace publisher URLs with doi.org equivalents (313)#759
Merged
richarddushime merged 3 commits intomainfrom May 7, 2026
Merged
Replace publisher URLs with doi.org equivalents (313)#759richarddushime merged 3 commits intomainfrom
richarddushime merged 3 commits intomainfrom
Conversation
Many references in content/ pointed directly at publisher pages (SAGE, ScienceDirect, Wiley, Springer, Royal Society, etc.) instead of using the doi.org form. Publishers bot-block automated link checks, which produced unhelpful "broken" reports for valid links. This pass converts 207 of those URLs to https://doi.org/{DOI}, after verifying every replacement DOI resolves via the DOI Handle API: 161 sciencedirect PII (PII -> DOI via Crossref alternative-id) 33 psycnet doiLanding?doi=... (DOI is in the URL query string) 6 royalsocietypublishing.org/doi/... 3 link.springer.com/article|chapter|... 2 journals.sagepub.com/doi/... 1 onlinelibrary.wiley.com/doi/... 1 tandfonline.com/doi/... Files changed: content/OS-developing-world, content/adopting, content/educators-corner/004-Teaching-why-how-replication, content/neurodiversity-lessonbank/masterstools, content/reversals. Not changed in this pass (~178 URLs remain): 45 ScienceDirect PIIs that Crossref didn't have indexed (likely conference papers / supplements), 87 psycnet /record/, /buy/, /fulltext/ URLs, 17 JSTOR /stable/ URLs (only 5 map cleanly to 10.2307/<id>), 15 academic.oup.com /article-abstract/, 7 journals.lww.com, plus a handful of other patterns. Each needs a per-URL lookup rather than a regex extraction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
👍 All image files/references (if any) are in webp format, in line with our policy. |
Contributor
📝 Spell Check ResultsFound 1 potential spelling issue(s) when checking 10 changed file(s): 📄
|
| Line | Issue |
|---|---|
| 94 | pre-selected ==> preselected |
ℹ️ How to address these issues:
- Fix the typo: If it's a genuine typo, please correct it.
- Add to whitelist: If it's a valid word (e.g., a name, technical term), add it to
.codespell-ignore.txt - False positive: If this is a false positive, please report it in the PR comments.
🤖 This check was performed by codespell
Contributor
Author
|
✅ Staging Deployment Status This PR has been successfully deployed to staging as part of an aggregated deployment. Deployed at: 2026-05-07 19:13:54 UTC The staging site shows the combined state of all compatible open PRs. |
Phase 2 of the publisher → doi.org conversion. Earlier pass relied
purely on regex extraction (DOI visible in URL) plus Crossref's
alternative-id filter for ScienceDirect PIIs. This pass adds:
- Crossref alternative-id lookup for psycnet record/buy/fulltext IDs
(PsycInfo IDs are deposited as alternative identifiers for many
journal articles).
- Crossref alternative-id lookup for OUP article numbers.
- OpenAlex bibliographic search as a fallback for JSTOR stable IDs
and OUP article IDs that Crossref didn't index.
- Re-runs of the ScienceDirect PIIs that hit transient rate-limits
in the first pass.
Resolved (all 90 unique DOIs Handle-API verified):
57 psycnet record/buy (Crossref alt-id)
22 ScienceDirect PII (Crossref alt-id, prior misses retried)
14 OUP article (mostly OpenAlex)
10 JSTOR stable (OpenAlex)
3 psycnet fulltext (Crossref alt-id)
72 publisher URLs still remain in content/ — mostly psycnet
records and ScienceDirect PIIs that neither Crossref nor OpenAlex
recognises, plus 7 lww URLs (their ID format isn't indexed by either).
Those are likely conference papers, supplements, or content the
publisher never deposited. They'd need per-URL human verification.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One psycnet doiLanding URL had a #:~:text= scroll-anchor appended, which my Phase 1 regex didn't strip. Tighten the stop set to include # and resolve the underlying DOI cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Many references in `content/` pointed at publisher pages (SAGE, ScienceDirect, Wiley, Springer, Royal Society, OUP, JSTOR, psycnet, etc.) instead of `https://doi.org/{DOI}\`. Publishers bot-block automated checks, so those links were producing unhelpful "broken" reports in the link checker even though the underlying papers are fine.
This PR converts 313 publisher URLs to their `doi.org` form, in two passes. Every replacement DOI was verified against the DOI Handle API before being applied — nothing was changed without confirming the new link resolves.
Pass 1 — DOI visible in URL (207)
Regex-extract from the URL itself, then verify.
Pass 2 — DOI looked up from URL ID (106)
Extract the publisher's record/article/stable ID, look it up via Crossref's `alternative-id` filter, fall back to OpenAlex bibliographic search for JSTOR/OUP. Verify via Handle API.
Not addressed (~72 URLs left)
These need per-URL human verification — likely "the original article never had a DOI deposited" rather than a fixable typo.
Test plan
🤖 Generated with Claude Code