Skip to content

data import script #203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

data import script #203

wants to merge 9 commits into from

Conversation

cooperlab
Copy link

Adds a command-line script for data import or upload and documentation of data formats.

@cooperlab cooperlab linked an issue May 13, 2025 that may be closed by this pull request
@cooperlab
Copy link
Author

cooperlab commented May 13, 2025

@abs711 update the pixelmap_annotation function to capture the additional information needed by the platform.

whole-slide image (various formats)
Any format that is supported by `large image <https://girder.github.io/large_image/formats.html>`_ can be used.
feature (.h5)
This file contains a single array where each row is a feature embedding for the object. A single blank row should be prepended if the image contains non-object background pixels.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of "the object" we could say "each superpixel"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also for the rest of the paragraph

@cooperlab
Copy link
Author

@abs711 we need to check a few things:
-Are bounding boxes defined at the pixelmap image resolution, or the wsi resolution?
-Do we need a dummy bounding box for the background?

@andsild see if the superpixelSize is used anywhere after data generation.

@abs711
Copy link
Collaborator

abs711 commented May 13, 2025

@abs711 we need to check a few things: -Are bounding boxes defined at the pixelmap image resolution, or the wsi resolution?

They are defined at wsi resolution:

if str(opts.bounding).lower() not in {'', 'none'}:
    regions = skimage.measure.regionprops(1 + segments)
    for _pidx, props in enumerate(regions):
        by0, bx0, by1, bx1 = props.bbox
        bboxes.append((
            ((bx0 + bx1) / 2 + tx0) * scale + x0,
            ((by0 + by1) / 2 + ty0) * scale + y0,
            (bx1 - bx0) * scale,
            (by1 - by0) * scale))
        bboxesUser.extend([
            (bx0 + tx0) * scale + x0,
            (by0 + ty0) * scale + y0,
            (bx1 + tx0) * scale + x0,
            (by1 + ty0) * scale + y0,
        ])

@andsild
Copy link
Collaborator

andsild commented May 13, 2025

@andsild see if the superpixelSize is used anywhere after data generation.

TL;DR nowhere important as far as I can tell.

superpixelSize is one of the parameters defined in the mongoDB database (which is why we struggled with "missing jobId" last week"). It would be fetched here:
https://github.com/girder/slicer_cli_web/blob/master/slicer_cli_web/rest_slicer_cli.py#L568

but seems to only really be used if you ask for heatmaps with predictions from a chain here:
https://github.com/DigitalSlideArchive/superpixel-classification/blob/main/superpixel_classification/SuperpixelClassification/SuperpixelClassificationBase.py#L844

heatmaps are disabled by default for now. So I would say we don't have have to address anything.

Probably not relevant now, not sure about scenarios like:

  1. a user uploads slides through this import job
  2. adds more slides in the UI
  3. asks UI to generate superpixels for new slides

I assume a default superpixelsize will be used for item 3, which may cause a mismatch.

@cooperlab
Copy link
Author

Great, thank you guys. I will update the README and will make a note of possible issues with superpixel size.

Someone who is uploading data is unlikely to generate additional data through the interface, and so it may be mostly irrelevant.

@andsild
Copy link
Collaborator

andsild commented May 13, 2025

FYI I'm working on integrating my changes to work with AML for this and the superpixel_classification repository.
it should fix problems with background both for UI and in the backend.

@abs711
Copy link
Collaborator

abs711 commented May 13, 2025

@abs711 update the pixelmap_annotation function to capture the additional information needed by the platform.

@manthey Can I have the permissions to push changes?

@manthey
Copy link
Contributor

manthey commented May 14, 2025

@abs711 update the pixelmap_annotation function to capture the additional information needed by the platform.

@manthey Can I have the permissions to push changes?

You should have an invite

@cooperlab
Copy link
Author

@manthey wondering how progressCallback can be used with tqdm in GirderClient.uploadFileToFolder?

@manthey
Copy link
Contributor

manthey commented May 19, 2025

For any of the girder client upload function that take a progress call back, you should be able to do something like:

with tqdm.tqdm(total=0) as pbar:
    def progFunc(prog):
        pbar.total = prog['total']
        pbar.n = prog['current']
        pbar.update(0)

    gc.uploadFileToFolder(..., progressCallback=progFunc)

or as an obtuse lambda

with tqdm.tqdm(total=0) as pbar:
    gc.uploadFileToFolder(..., progressCallback=lambda prog: (pbar.total = prog['total'], pbar.n = prog['current'], pbar.update(0))

I'm not sure if

data_import.py Outdated
@@ -251,7 +251,7 @@ def main():
features = [row[1] for row in inputs]
pixelmaps = [row[2] for row in inputs]
boxes = [row[3] for row in inputs]
scales = [row[4] for row in inputs]
scales = [int(row[4]) for row in inputs]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abs711 Should this be float? I don't think the scales will always be integer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Import and upload script
4 participants