RSS ingestor#1276
Conversation
There was a problem hiding this comment.
Pull request overview
Adds RSS/Atom feed ingestion to TeSS by introducing dedicated ingestors for events and materials, including optional HTML feed discovery and support for several common metadata extensions (Dublin Core, RDF/Bioschemas, iTunes, Yahoo Media).
Changes:
- Introduce shared RSS/Atom ingestion helpers (
RSSIngestion) plus reusable Dublin Core parsing/building (DublinCoreIngestion). - Add new ingestors for event and material RSS/Atom feeds, including RDF/Bioschemas merge behavior and HTML alternate-feed discovery.
- Add RSS Media namespace support for Atom parsing and comprehensive unit tests for RSS/Atom ingestion and extensions.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/unit/rss_media_atom_test.rb | Tests Media namespace installation idempotency for Atom. |
| test/unit/ingestors/material_rss_ingestor_test.rb | Material RSS/Atom ingestion tests (DC, RSS versions, RDF/Bioschemas, HTML discovery, media/iTunes extensions). |
| test/unit/ingestors/event_rss_ingestor_test.rb | Event RSS/Atom ingestion tests (DC, relative links, RDF/Bioschemas, HTML discovery). |
| lib/rss/media.rb | Defines Yahoo Media RSS extension wiring + loads Atom-specific patch. |
| lib/rss/media/atom.rb | Patches Atom classes to support media:group parsing and makes namespace installation idempotent. |
| lib/ingestors/rss_ingestion.rb | Shared feed fetching/parsing + HTML discovery + extraction/merge helpers. |
| lib/ingestors/dublin_core_ingestion.rb | Centralized DC-to-OpenStruct builders and normalization helpers. |
| lib/ingestors/material_rss_ingestor.rb | New material RSS/Atom ingestor (RSS/RDF/Atom + Bioschemas LearningResource extraction). |
| lib/ingestors/event_rss_ingestor.rb | New event RSS/Atom ingestor (RSS/RDF/Atom + Bioschemas Event/Course extraction). |
| lib/ingestors/oai_pmh_ingestor.rb | Refactors OAI-PMH DC parsing to reuse DublinCoreIngestion. |
| lib/ingestors/ingestor_factory.rb | Registers the new RSS ingestors. |
| config/initializers/inflections.rb | Adds RSS acronym for correct Zeitwerk/inflector naming. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Two additional notes:
|
fbacall
left a comment
There was a problem hiding this comment.
Looks good - very flexible and nice tests. I think some parts can be simplified, and it might be good to split the YouTube functionality into a simple subclass for the sake of clarity.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
lib/ingestors/material_rss_ingestor.rb:227
build_material_from_atom_itemalways doesAddressable::URI.join(feed_url, ...)even thoughextract_atom_linkcan return nil when an entry has no usable link. This can raise and abort ingestion. Guard for blank links (and preserve any URL already set from Dublin Core identifiers) before callingAddressable::URI.join.
material = build_material_from_dublin_core_data(extract_dublin_core(item))
media_title = text_value(item.media_group&.media_title)
material.title ||= text_value(item.title) || media_title
material.url = Addressable::URI.join(feed_url, text_value(extract_atom_link(item))).to_s
media_group_description = text_value(item.media_group&.media_description)
There was a problem hiding this comment.
I addressed all the comments.
Additionally, I removed the RSS ingestor for events in 6fe1eff. I don't know if there is even an RSS feed that has events, the event metadata that could be got from RSS is not that useful, and it made the implementation unnecessarily complex. If this is needed in the future it can be brought back relatively easily.
Summary of changes
linkelement withapplication/rss+xmlor atomMotivation and context
Closes #722
Screenshots

Checklist