Skip to content

Fix missing file sizes#12455

Open
qqmyers wants to merge 3 commits into
IQSS:developfrom
QualitativeDataRepository:FixMissingFileSizes
Open

Fix missing file sizes#12455
qqmyers wants to merge 3 commits into
IQSS:developfrom
QualitativeDataRepository:FixMissingFileSizes

Conversation

@qqmyers

@qqmyers qqmyers commented Jun 11, 2026

Copy link
Copy Markdown
Member

What this PR does / why we need it: QDR has noticed a production case where ~2000 files did not have file sizes in the database. It is believed that this can/could have occurred during S3 direct upload when the code tries to retrieve the size from S3. If the S3 store is only 'eventually consistent' this call can fail, resulting in a null file size in the database. (Whether this is possible now or was only an issue in past releases is not clear.).

Overall, this state is non-fatal, but it does affect the reported download size when downloading a dataset (i.e. the download may be bigger than shown due to the files with no size not adding to the total).

This PR adds an /api/admin/datafiles/integrity/fixmissingfilesizes (somewhat analogous to the one to fixmissingoriginalfilesizes) that will, for stores indicating that Dataverse can access the files (e.g. not a remote store where the URL may be blocked or point to a remote trusted store landing page whose size is not that of the real file), try to retrieve the size and update the database.

It may be worth seeing whether there are instances for which select count(*) from datafile where filesize is null; is not zero to decide whether this is a useful addition for the community (feel free to comment on this PR if you see this in your installation).

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer:

Suggestions on how to test this: It's not clear how to trigger this condition - easiest to just delete some filesizes from a test database and confirm that the api call restores them.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@qqmyers qqmyers added the GDCC: QDR of interest to QDR label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GDCC: QDR of interest to QDR

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant