Skip to content

AMR indiv and fungal large files uploads management#367

Draft
anagperal wants to merge 14 commits into
developmentfrom
feature/amr-indiv-fungal-uploads-file-chunk
Draft

AMR indiv and fungal large files uploads management#367
anagperal wants to merge 14 commits into
developmentfrom
feature/amr-indiv-fungal-uploads-file-chunk

Conversation

@anagperal

@anagperal anagperal commented Dec 16, 2025

Copy link
Copy Markdown
Contributor

📌 References

BASED ON CODE FROM #363 by @MatiasArriola

📝 Implementation

  • install papaparse
  • read CVS in chunks
  • allow import tracker entities in sync mode
  • in server uploads for AMR indiv and AMR fungal: use chunks to manage RIS files and save tracker entitites in sync mode
  • Use concurrency when importing in async uploads
  • Add skipSideEffects true in async uploads
    Async upload workflow — one continuous sequence
  • Add Stop-on-import-error

🔁 Async Uploads Workflow (AMR – Individual / AMR – Fungal)

This document describes, step by step, how the server-side async upload script imports a
primary CSV file into DHIS2. The script processes uploads that were queued in the Datastore,
validating the whole file before importing and sending the data to the server in chunks.

The entry point is the uploadDatasets routine in src/scripts/cliAsyncUploads.ts.

One continuous sequence

Processing the queue
  1. The script takes the pending uploads that still have retry attempts left and processes them
    one at a time — there is no parallelism between uploads; all chunking and concurrency
    happens inside a single upload.
  2. For each upload it pauses about half a second, then re-checks the upload is still marked
    pending. If not, it is skipped.
  3. If still pending, it loads the upload's record, determines whether it is a primary or
    secondary file, and finds its module (a missing module is an error).
Preparing the file
  1. The upload is marked "uploading" and its CSV is downloaded from storage.
  2. The configured upload chunk size is read — how many rows go to the server per request.
    It comes from the module's Datastore config (default 100; currently testing 300 vs 500 comming from Datastore config).
  3. The program's configuration and validation rules are fetched once, up front.
Validation pass (whole file, before anything is imported)
  1. The file is read in large blocks of 5,000 rows. Each block is fully validated — custom
    checks plus program-rule checks.
  2. This pass is fail-fast: the instant a blocking error appears, reading stops, an error report
    is saved, and the import is skipped entirely — guaranteeing the whole file is validated
    before any data goes in.
Import pass (only if validation passed completely)
  1. The file is read again in the same 5,000-row blocks.
  2. Each block is split into smaller chunks of the configured upload chunk size (300/500).
  3. Those chunks are sent to the server in synchronous mode, with the server's rule engine
    and side effects turned off
    (the rules were already checked in the validation pass),
    6 at a time. The concurrency is batched: six are sent, the script waits for all six to
    finish, then sends the next six.
  4. Stop-on-import-error: if any chunk fails, the process stops immediately — no further
    chunks or blocks are imported — keeping whatever succeeded so far.
  5. As it goes, it accumulates the results and the IDs of the created records. At the end, if any
    records were imported, their ID list is saved to a file and linked to the upload, and the
    import summaries are saved.
Finishing the upload
  1. The accumulated results are inspected for blocking errors and whether any rows were actually
    imported.
  2. The upload's final status is set accordingly — validated (clean), imported (partial, with
    errors), or uploaded (nothing imported) — and the upload is removed from the queue.
  3. If anything failed during the whole sequence, the upload's attempt counter is increased
    instead; once it reaches the maximum (3 by default), it is removed from the queue and marked
    failed, otherwise it stays for a later run to retry.

Secondary files follow the same shape, except AMR – Individual does a trial run first and
only imports for real if the trial finds no blocking errors.

The knobs, in one place

Knob Value Notes
Uploads one at a time No concurrency between uploads.
File blocks 5,000 rows (fixed) Used for both the validation and import passes.
Upload chunk size 300 / 500 (default 100) Datastore config; rows per server request.
Concurrency 6 chunks, batched Currently hardcoded

📹 Screenshots/Screen capture

🔥 Testing

transform_JPN_INDIV_2023.sh

@anagperal anagperal changed the title AMR indiv and fungal large files uploads/deletions management AMR indiv and fungal large files uploads management Dec 17, 2025
@ifoche

ifoche commented May 8, 2026

Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants