Package native QTI items as QTI/IMSCP archives#6029
Conversation
Route type=QTI assessment items through publish verbatim instead of rebuilding item XML from structured fields, since their authored XML is already stored in raw_data. - Validate raw_data against the QTI schema before packaging - Resolve and write referenced media alongside the item - Extract the media-reference scanner into its own module, shared with the AssessmentItem viewset - Refactor QTIExerciseGenerator's manifest bookkeeping to a generator-agnostic resource list so both the pydantic-built and native paths populate it Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Dispatch to QTIExerciseGenerator unless a node has a perseus_question item, instead of only when it has a free_response item. QTIExerciseGenerator already synthesizes valid QTI XML for every legacy structured-field type; perseus_question is the only type it can't represent. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
rtibbles
left a comment
There was a problem hiding this comment.
This is an excellent start, but a bit more cleanup is needed, I think.
| files_by_name = { | ||
| f"{f.checksum}.{f.file_format_id}": f for f in assessment_item.files.all() | ||
| } | ||
| missing = sorted(filenames - files_by_name.keys()) |
There was a problem hiding this comment.
What purpose does this serve? It breaks publishing for the sake of book keeping - we should log an error here, but carry on.
There was a problem hiding this comment.
Logs and continues now — _write_qti_media_files calls logging.error with the missing filenames instead of raising, and returns only the files that are actually linked. The item and the rest of the archive still publish; the raw_data still references the missing checksum but that reference was already dangling before this change reached publish.
| error_messages = "; ".join( | ||
| f"line {e.line}, column {e.column}: {e.message}" for e in result.errors | ||
| ) | ||
| raise ValueError( |
There was a problem hiding this comment.
So why not just skip it? Why raise a ValueError here and break the publishing?
There was a problem hiding this comment.
Skips the item now instead: on validation failure _create_native_qti_item logs an error and returns None; create_assessment_item propagates that, and process_assessment_item (base.py) skips writing/registering the item when the result is None. The rest of the node's items, and the rest of the channel, still publish.
Root cause for why this mattered: a ValueError out of create_exercise_archive() isn't caught anywhere in publish.py's recurse_nodes/publish_channel — it was propagating all the way out and aborting the entire channel publish, not just this node.
Flagging explicitly: this changes the issue's stated acceptance criterion ("failures stop publish with a clear error") and the plan's Task 3 note ("On failure, raise ValueError immediately"). Treating your live review as the more current signal on intended behavior here, but wanted the deviation on record rather than silent.
| ) | ||
|
|
||
| filepath = f"items/{identifier}.xml" | ||
| file_dependencies = self._write_qti_media_files(assessment_item) |
There was a problem hiding this comment.
I am not seeing where we are updating the media file paths to stop them just all sitting in the items folder bare. That's a significant change from the current export implementation.
There was a problem hiding this comment.
Intentional, not an omission. Native items keep raw_data byte-identical per the acceptance criteria, and the authored XML's src/href values are bare <checksum>.<ext> strings with no path prefix (see the anchored regex in qti/media.py's get_qti_media_references) — that's the TipTap editor's permanentSrc convention. Since we can't rewrite those attributes without breaking byte-identical fidelity, the referenced file has to physically sit next to the item XML (items/) for the bare relative reference to resolve; it can't go in items/images/ the way the legacy pydantic-built path does, since that path generates and owns its own XML and can prefix image paths freely.
No code change planned here — let me know if you want that byte-identical constraint relaxed instead.
There was a problem hiding this comment.
Yes, we should relax the byte identical requirement here. The important thing is that these filepath remappings are the only changes that get made.
There was a problem hiding this comment.
Done — media is now written to items/images/ (matching the legacy layout) and the item XML's src/href/data/srcset values referencing those files are remapped to the images/ prefix that resolves correctly from the item's own location.
The rewrite (rewrite_qti_media_paths in qti/media.py) is a targeted text substitution over the raw XML string, not a parse/serialize round-trip — it only swaps the matched attribute-value characters for their new path, so it can't introduce incidental formatting changes (e.g. lxml re-serialization normalizes <img ... /> to <img .../>, which we verified and avoided). Items with no media references are unaffected and remain byte-identical to raw_data, as before.
Covered by new tests in test_media.py (rewrite_qti_media_paths behavior directly) and an updated test_native_qti_item_media_included_and_addressed in test_exercise_creation.py that asserts the zip layout, manifest, and exact rewritten item XML.
| ) | ||
|
|
||
|
|
||
| qti_reference_attributes = ("src", "href", "data") |
There was a problem hiding this comment.
Good consolidation from this file - I was worried at first we were duplicating this logic!
There was a problem hiding this comment.
Yep — get_filenames_from_qti_item now just delegates to get_qti_media_references in qti/media.py, shared with the native packaging path added in this PR.
A ValueError from QTIExerciseGenerator wasn't caught anywhere in publish.py's recurse_nodes/publish_channel, so any single failure aborted the entire channel publish rather than just the offending item. - Schema-invalid native items are now logged and excluded from the package instead of raising. - Media references with no linked File are now logged and dropped from the manifest instead of raising. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
|
I was unable to complete this task after multiple attempts. Manual review and intervention may be needed. |
rtibbles
left a comment
There was a problem hiding this comment.
I do, indeed, want better file organization.
| ) | ||
|
|
||
| filepath = f"items/{identifier}.xml" | ||
| file_dependencies = self._write_qti_media_files(assessment_item) |
There was a problem hiding this comment.
Yes, we should relax the byte identical requirement here. The important thing is that these filepath remappings are the only changes that get made.
Media files referenced by native QTI items were being written bare into items/ alongside the item XML, unlike the legacy generator's items/images/ layout. Write them to items/images/ instead, and rewrite the item XML's src/href/data/srcset values to the images/ prefix that resolves to that directory. The rewrite is a targeted text substitution (not a parse/ serialize round-trip) so it changes only the remapped path values - everything else in the authored raw_data is preserved byte-for-byte. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
rtibbles
left a comment
There was a problem hiding this comment.
Looks good to me, having the extra nesting under images should help to keep things organized.
Summary
Exercise nodes whose assessment items are natively-authored QTI (
AssessmentItem.type == QTI) now publish by packaging the item XML already stored inraw_dataverbatim into a QTI/IMSCP zip, instead of rebuilding item XML from structured fields.Referenced media is written to
items/images/, matching the legacy pydantic path's layout. The item XML'ssrc/href/srcsetreferences are rewritten in place to the newimages/...path — the only bytes allowed to differ from the authoredraw_data.A native item that fails schema validation, or references media with no matching
Filerow, is logged and excluded from the package rather than aborting the whole publish.Dispatch also changes: a node routes to QTI packaging unless it has a
perseus_questionitem (previously: only when it had afree_responseitem).QTIExerciseGeneratoralready synthesizes valid QTI XML for every legacy structured-field type, so this is a deliberate broadening — nodes with only legacy structured-field items now get QTI packages instead of Perseus packages on next republish.References
Closes #6000. Builds on #6005 (QTI schema validation) and #5999 (QTI
raw_dataround-trip through save/sync).Reviewer guidance
archive.py_create_native_qti_item/_write_qti_media_files(archive.py:203-268): the remapping is a targeted text substitution (media.py'srewrite_qti_media_paths), not a parse/serialize round-trip, so every other byte ofraw_datais untouched.publish.py:366-373: please confirm the dispatch-rule broadening's scope is intended, not just the diff mechanics.ValueError.ValueErrorpropagates throughpublish_channeltoChannelViewSet.publish's existing error handling — no new plumbing needed.AI usage
Implemented with Claude Code from a maintainer-authored plan (issue comment), following TDD per task. Ran the full assessment/publish/QTI test suite and pre-commit hooks (flake8, reorder-python-imports, black) before each commit. Also ran an automated
/simplifypass over the diff, using 4 independent review agents across reuse, simplification, efficiency, and altitude angles. Applied the one actionable finding: reused an existing QTI XML test fixture instead of hand-rolling an equivalent one.@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly
How was this generated?
🟡 Waiting for feedback
Last updated: 2026-07-04 00:45 UTC