S3 fixes for non AWS instances#12454
Open
qqmyers wants to merge 5 commits into
Open
Conversation
Note BackBlaze B2 doesn't support tags and appears to treate this as an object delete call.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it: QDR has used storJ and is now switching to use BackBlaze B2.
This PR fixes a critical issue we found using B2 which may be relevant for other stores that do not support tagging (which we use to tag files as temporary until the upload completes). In short, while the .disable-tagging=true setting stops Dataverse from writing a tag, it still tried to delete the tag even with that flag (resulting in B2 deleting the file :-)). The change here only tries to delete the tag if we set it/disable-tagging is false/not set. (This appears to be the only change needed for B2 - I've added it to the list of compatible S3 implementations.)
For storJ, which works fine, we noticed that the /cleanStorage APi call to delete unnecessary files (i.e. upload attempts where the file was sent to S3 and the call to addFiles to Dataverse didn't happen/failed) fails when there are many files in the dataset (probably > 1000). This PR updates the code to send a maxKeys(1000) when any list of objects is done - apparently without this storJ doesn't send a continuation token.
With the fix for storJ and /cleanStorage, the PR refactors the code to list objects in S3 to reduce duplicate/unnecessary code.
Which issue(s) this PR closes:
Special notes for your reviewer:
Suggestions on how to test this: The changes have been tested at QDR. If more than regression testing is desired, one can get a free B2 account (up to 10GB) for testing and set up a B2 store - should work in creating datasets, etc. as any other S3 store. For storJ, the problem can only be seen with more than 1K files and only with the /cleanStorage api call - happy to try and provide details if someone wants to recreate the issue.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: