From 7f048400d799e4cc5de68358758a89e55c36be74 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 12 May 2026 19:18:31 +0000 Subject: [PATCH 1/4] publications: link titles, bold lab authors, year-filter chips, type badge MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Four enhancements to the publications page, none of which require any new manual data entry — everything is auto-derived from the existing publications.yaml and people.yml. 1. Title hyperlinks: the URL field was already present in the data (auto-populated by scholar_scraper.py) but the template ignored it. Now when present. 2. Bold lab authors: every author whose (first-initial, last-name) matches a person in _data/people.yml (PI + all roles, current and former) renders in . The first-initial check avoids false positives like Amanda Li getting bolded because Thomas Li is a former undergrad — Pearson papers still match "J Pearson", "JM Pearson", "John M Pearson", etc., since they all start with J. 3. Year-filter chips: a row of buttons at the top of the page lets visitors narrow to a single year. Pure DOM toggle, ~15 lines of vanilla JS, no dependencies. 4. Venue-type badge: each entry shows a small PREPRINT / JOURNAL / CONFERENCE pill derived from container-title (bioRxiv/arXiv/etc = preprint; proceedings/NeurIPS/conference = conference; otherwise journal). No data-entry overhead — fully auto-derived. Lab can add a manual `type:` override later if a venue is misclassified. Verified locally: vnu HTML5 0 errors; Playwright screenshots of "All" and "2025"-filtered states both render correctly with appropriate bolding and badges. --- _includes/pubs.html | 120 ++++++++++++++++++++++++++++++++++--------- css/publications.css | 47 +++++++++++++++++ 2 files changed, 144 insertions(+), 23 deletions(-) diff --git a/_includes/pubs.html b/_includes/pubs.html index a277b34..32789a1 100644 --- a/_includes/pubs.html +++ b/_includes/pubs.html @@ -1,27 +1,101 @@ -

This list is not guaranteed to be up to date, but the lab's complete publications can be found here.

+{%- comment -%} + Publications list rendered from _data/publications.yaml (auto-populated + weekly by scholar_scraper.py from Google Scholar). + + Features: + - Each title links to the paper's URL when one is present in the data. + - Lab member authors are bold. The set of "lab last names" is built + from _data/people.yml (PI + all roles, current and former), so + adding/removing a member there propagates here automatically. + - Auto-derived type badge (journal / preprint / conference) based + on the venue string — no extra data entry required. + - Year-filter chips at the top let visitors narrow to a single year. +{%- endcomment -%} + +{%- assign all_lab_people = "" | split: "" -%} +{%- if site.data.people.pi -%}{%- assign all_lab_people = all_lab_people | push: site.data.people.pi.name -%}{%- endif -%} +{%- for p in site.data.people.postdocs -%}{%- assign all_lab_people = all_lab_people | push: p.name -%}{%- endfor -%} +{%- for p in site.data.people.graduate_students -%}{%- assign all_lab_people = all_lab_people | push: p.name -%}{%- endfor -%} +{%- for p in site.data.people.research_associates -%}{%- assign all_lab_people = all_lab_people | push: p.name -%}{%- endfor -%} +{%- for p in site.data.people.undergraduates -%}{%- assign all_lab_people = all_lab_people | push: p.name -%}{%- endfor -%} + +{%- comment -%} + Build "first-initial|last-name" keys (e.g. "J|Pearson") so matching is + tighter than last-name-only — keeps Amanda Li from being bolded just + because Thomas Li is a former undergrad. +{%- endcomment -%} +{%- assign lab_keys = "" | split: "" -%} +{%- for full_name in all_lab_people -%} + {%- assign parts = full_name | split: " " -%} + {%- assign first = parts | first -%} + {%- assign last = parts | last -%} + {%- assign initial = first | slice: 0, 1 -%} + {%- assign key = initial | append: "|" | append: last -%} + {%- assign lab_keys = lab_keys | push: key -%} +{%- endfor -%} + +

This list is not guaranteed to be up to date, but the lab's complete publications can be found here.

+ {% assign sorted_pubs = site.data.publications | group_by_exp:'item', 'item.issued.first.year' | sort: 'name' | reverse %} + + +
{% for year in sorted_pubs %} -

{{ year.name }}

-
    - {% for pub in year.items %} -
  1. - {{ pub.title }} -
    - {% for aa in pub.author %} - {% if forloop.last == true %}and {% endif %} {{ aa.given }} {{ aa.family }}{% if forloop.last == false %}, {% endif %} - {% endfor %} - ({{ year.name }}). -
    - {{ pub.container-title }} - {% if pub.volume != blank %} - {{ pub.volume}}{%if pub.issue and pub.issue != "" %}({{ pub.issue }}){% endif %} - {% endif %} -
    - {% if pub.note != blank %} - {{ pub.note}} - {% endif %} -
  2. - {% endfor %} -
+
+

{{ year.name }}

+
    + {% for pub in year.items %} + {%- assign venue_lc = pub.container-title | downcase -%} + {%- if venue_lc contains "biorxiv" or venue_lc contains "arxiv" or venue_lc contains "medrxiv" or venue_lc contains "chemrxiv" or venue_lc contains "psyarxiv" or venue_lc contains "osf preprints" -%} + {%- assign pub_type = "preprint" -%} + {%- elsif venue_lc contains "proceedings" or venue_lc contains "advances in neural" or venue_lc contains "neurips" or venue_lc contains "cosyne" or venue_lc contains "conference" -%} + {%- assign pub_type = "conference" -%} + {%- else -%} + {%- assign pub_type = "journal" -%} + {%- endif -%} +
  1. + {{ pub_type }} + {% if pub.URL and pub.URL != "" %}{{ pub.title }}{% else %}{{ pub.title }}{% endif %} +
    + {% for aa in pub.author -%} + {%- if forloop.last and forloop.first == false %}and {% endif -%} + {%- assign author_initial = aa.given | slice: 0, 1 -%} + {%- assign author_key = author_initial | append: "|" | append: aa.family -%} + {%- if lab_keys contains author_key -%}{{ aa.given }} {{ aa.family }}{%- else -%}{{ aa.given }} {{ aa.family }}{%- endif -%} + {%- if forloop.last == false %}, {% endif -%} + {%- endfor %} + ({{ year.name }}). +
    + {{ pub.container-title }} + {%- if pub.volume != blank %} {{ pub.volume }}{% if pub.issue and pub.issue != "" %}({{ pub.issue }}){% endif %}{% endif %} + {%- if pub.note != blank %} +
    {{ pub.note }} + {% endif %} +
  2. + {% endfor %} +
+
{% endfor %} +
+ + diff --git a/css/publications.css b/css/publications.css index c1f18da..404d99b 100644 --- a/css/publications.css +++ b/css/publications.css @@ -2,6 +2,53 @@ max-width: 900px; } +/* Year filter chips */ +.year-filter { + margin: 0 0 1.5em 0; + display: flex; + flex-wrap: wrap; + gap: 0.4em; +} + +.year-filter .year-chip { + padding: 0.25em 0.7em; + border: 1px solid #ccc; + background: #fff; + border-radius: 12px; + font-size: 0.9em; + cursor: pointer; + color: #444; + line-height: 1.4; +} + +.year-filter .year-chip:hover { + background: #f5f5f5; +} + +.year-filter .year-chip.active { + background: #337ab7; + color: #fff; + border-color: #337ab7; +} + +/* Per-paper venue-type badge (auto-derived from container-title) */ +.pub-badge { + display: inline-block; + padding: 0.1em 0.5em; + margin-right: 0.5em; + border-radius: 3px; + font-size: 0.7em; + font-weight: 600; + text-transform: uppercase; + vertical-align: middle; + letter-spacing: 0.04em; +} + +.pub-badge-journal { background: #e8f0fe; color: #1a73e8; } +.pub-badge-preprint { background: #fef3c7; color: #92400e; } +.pub-badge-conference { background: #f0e6ff; color: #6b21a8; } + +/* Preserve the existing tighter list spacing inside year sections. */ h3 li:not(:last-child) { margin-bottom: 0.75em; } From 3b8fd6aa775b171462c2b9e907d90c074e462548 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 12 May 2026 19:24:02 +0000 Subject: [PATCH 2/4] publications: accept HTTP 202 and timeouts in lychee MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two new lychee tweaks surfaced by PR #45's URL-linking on the publications page: 1. eScholarship's two URLs in publications.yaml return 202 Accepted. That's a legitimate success response for deferred-fetch repos (they're processing the request and the page will load once the pipeline finishes), but lychee's default accept list doesn't include it. Add 202. 2. web.stanford.edu/~hastie/ElemStatLearn/ (referenced from learning.html) intermittently times out from the runner — was a 200 redirect on the last green run. Set accept_timeouts = true so transient slow hosts don't break CI on every run. --- lychee.toml | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/lychee.toml b/lychee.toml index c373d1d..025b924 100644 --- a/lychee.toml +++ b/lychee.toml @@ -22,10 +22,16 @@ user_agent = "Mozilla/5.0 (compatible; lychee-link-checker; +https://lychee.cli. # Status codes to treat as success. We're trying to catch dead links # (404) and broken servers, not paywalls or anti-bot measures. # - 200/204/206: success +# - 202: escholarship.org and some other deferred-fetch repos return +# this on a request that will succeed if you retry or wait # - 401: paywalled but exists # - 403/405/429: bot-protection / rate-limit / method-not-allowed # - 999: LinkedIn's bot-detection response -accept = [200, 204, 206, 401, 403, 405, 429, 999] +accept = [200, 202, 204, 206, 401, 403, 405, 429, 999] + +# Don't fail on transient timeouts — some academic hosts +# (web.stanford.edu/~ pages in particular) intermittently hang. +accept_timeouts = true # Hosts to skip: # - pearsonlab.github.io: this is the site we're building. The sitemap From 50b4d545a0223b25bf25dd384d1d170d8121f6ff Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 12 May 2026 19:27:35 +0000 Subject: [PATCH 3/4] publications: pass --accept-timeouts via CLI flag (compat with lychee 0.23) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit lychee-action@v2 ships lychee 0.23 by default, which doesn't recognise the accept_timeouts TOML field I added in the last commit — hence the exit-3 config error. Move the flag to the CLI args instead, which has been supported much longer. The 202 accept code stays in lychee.toml since it's a config-style addition that works on every version. --- .github/workflows/site-health.yml | 1 + lychee.toml | 7 ++++--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/.github/workflows/site-health.yml b/.github/workflows/site-health.yml index 3eefd82..0b59271 100644 --- a/.github/workflows/site-health.yml +++ b/.github/workflows/site-health.yml @@ -52,6 +52,7 @@ jobs: --config ./lychee.toml --no-progress --root-dir ${{ github.workspace }}/_site + --accept-timeouts --exclude-path '_site/blog/201[5-8]' --exclude-path '_site/blog/index\.html' _site diff --git a/lychee.toml b/lychee.toml index 025b924..e9a9fe5 100644 --- a/lychee.toml +++ b/lychee.toml @@ -29,9 +29,10 @@ user_agent = "Mozilla/5.0 (compatible; lychee-link-checker; +https://lychee.cli. # - 999: LinkedIn's bot-detection response accept = [200, 202, 204, 206, 401, 403, 405, 429, 999] -# Don't fail on transient timeouts — some academic hosts -# (web.stanford.edu/~ pages in particular) intermittently hang. -accept_timeouts = true +# accept_timeouts is passed as a CLI flag via the workflow's args, +# not as a TOML field, because lychee-action@v2 pins lychee 0.23 by +# default and that version doesn't recognise the accept_timeouts TOML +# field (the CLI flag has worked for much longer). See site-health.yml. # Hosts to skip: # - pearsonlab.github.io: this is the site we're building. The sitemap From d2ecd0effdb9df8cf08e71e34439a14484f30667 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 12 May 2026 19:28:46 +0000 Subject: [PATCH 4/4] publications: drop --accept-timeouts (not in lychee 0.23 which the action pins) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Second attempt at the Stanford timeout: the CLI flag isn't in lychee 0.23 either, which is what lychee-action@v2 ships with by default — the empty lychee report and exit-3 confirm it's still a config/arg error, not a real link issue. Backing the flag out entirely. The Stanford ~hastie URL was a 200 redirect on previous CI runs, so the TIMEOUT is transient. If it keeps failing we can either exclude that one URL, or bump lycheeVersion in the action to a newer release that has the flag. Keeping `accept = [..., 202, ...]` in TOML since that's worked across versions and fixes the escholarship 202 responses. --- .github/workflows/site-health.yml | 1 - lychee.toml | 5 ----- 2 files changed, 6 deletions(-) diff --git a/.github/workflows/site-health.yml b/.github/workflows/site-health.yml index 0b59271..3eefd82 100644 --- a/.github/workflows/site-health.yml +++ b/.github/workflows/site-health.yml @@ -52,7 +52,6 @@ jobs: --config ./lychee.toml --no-progress --root-dir ${{ github.workspace }}/_site - --accept-timeouts --exclude-path '_site/blog/201[5-8]' --exclude-path '_site/blog/index\.html' _site diff --git a/lychee.toml b/lychee.toml index e9a9fe5..2f49b0d 100644 --- a/lychee.toml +++ b/lychee.toml @@ -29,11 +29,6 @@ user_agent = "Mozilla/5.0 (compatible; lychee-link-checker; +https://lychee.cli. # - 999: LinkedIn's bot-detection response accept = [200, 202, 204, 206, 401, 403, 405, 429, 999] -# accept_timeouts is passed as a CLI flag via the workflow's args, -# not as a TOML field, because lychee-action@v2 pins lychee 0.23 by -# default and that version doesn't recognise the accept_timeouts TOML -# field (the CLI flag has worked for much longer). See site-health.yml. - # Hosts to skip: # - pearsonlab.github.io: this is the site we're building. The sitemap # and robots.txt contain absolute self-referential URLs by spec, but