DigitalSlideArchive · cooperlab · May 13, 2025 · May 13, 2025 · May 13, 2025 · May 13, 2025
diff --git a/README.rst b/README.rst
@@ -2,13 +2,15 @@
 WSI Superpixel Guided Labeling
 ==============================
 
-WSI Superpixel Guided Labeling is a `Girder 3 <https://github.com/girder>`_ plugin designed to be used in conjunction with `HistomicsUI <https://github.com/DigitalSlideArchive/HistomicsUI>`_ and `HistomicsTK <https://github.com/DigitalSlideArchive/HistomicsTK>`_ to facilitate active learning on whole slide images.
+WSI Superpixel Guided Labeling is a `Girder 3 <https://github.com/girder>`_ plugin for interactive development of image classifiers. It is designed to be used in conjunction with `HistomicsUI <https://github.com/DigitalSlideArchive/HistomicsUI>`_ and `HistomicsTK <https://github.com/DigitalSlideArchive/HistomicsTK>`_ and enables rapid development of classifiers with whole slide images using active learning.
 
-This plugin leverages the output of certain HistomicsTK/SlicerCLI jobs to allow end users to label superpixel regions of whole slide images to be used as input for machine learning algorithms.
+This plugin can be used to classify objects ranging from cell nuclei to high-power fields, and can operate on user provided data or data from a built-in pipeline that parcellates a whole-slide image into superpixels (see ``dsarchive/superpixel:latest``).
 
-An example algorithm is contained within the ``dsarchive/superpixel:latest`` docker image. This can be used to generate superpixels, features, and machine learning models for active learning on a directory of images. See the installation instructions below for how to include the image as part of your Digital Slide Archive deployment.
+The `Installation`_ instructions below describe how to install the plugin for your existing `Digital Slide Archive deployment <https://github.com/DigitalSlideArchive/digital_slide_archive/tree/master/devops/dsa>`_.
 
-Once the appropriate data is generated, a new view becomes available for labeling and retraining.
+See the `Data Import`_ section for details on the data import script and data formatting.
+
+.. contents:: Table of Contents:
 
 Installation
 ------------
@@ -133,6 +135,56 @@ You can review trained or predicted superpixels via the review mode. This allows
 .. image:: docs/screenshots/reviewmode.png
     :alt: The review mode
 
+Data Import
+-----------
+Users can provide their own data for use with the platform, providing flexibility in the type of objects, and methods of detection/segmentation and encoding. A command-line import script is provided for the upload or import of this data. The required file formats and script details are described below.
+
+Data formats
+~~~~~~~~~~~~~~~~~~~~~~~~
+Each slide in the dataset requires four files:
+
+whole-slide image (various formats)
+    Any format that is supported by `large image <https://girder.github.io/large_image/formats.html>`_ can be used.
+feature (.h5)
+    This file contains a single array where each row is a feature embedding for the object. A single blank row should be prepended if the image contains non-object background pixels.
+pixelmap (.tiff)
+    This image is used as a `pixelmap overlay <https://girder.github.io/large_image/annotations.html#tiled-pixelmap-overlays>`_ to define object locations for visualization and interactivity. Pixel values reflect the position of the object embedding in the feature file. For an object embedding in row 'i' of the feature array (zero-index), the corresponding pixels for that object should have value 2i, and the border pixels 2i+1. Non-object background pixels should be encoded using zero values.
+    An example pixelmap is below:
+
+    .. image:: docs/screenshots/pixelmap.png
+       :alt: Pixelmap example
+bounding boxes (.csv)
+    Each row of this .csv defines the left, top, right, and bottom pixel for a single object. Objects should be listed in the same order as they appear in the feature.h5 file.
+    For the pixelmap example above, assuming (0,0) is the top left, the csv file would have the following line:
+
+.. code-block:: csv
+
+    1,1,4,4
+
+
+Command-line Import Tool
+~~~~~~~~~~~~~~~~~~~~~~~~
+data_import.py is provided to import or upload user-generated data into the plugin.
+
+Import requires a csv file defining the paths to input files, an API key for your DSA instance, and a project name: ::
+
+    > data_import inputs.csv UI65ixMezye0LpBOyYozArB9czPu3PLNpq0RGlGn new_project
+
+Here, input.csv lists the whole-slide image, feature h5 file, pixelmap .tiff image, bounding box csv, and pixelmap downscale factor on each row: ::
+
+    > more inputs.csv
+    /remote/a.svs,/remote/a.svs.feature.h5,/remote/a.svs.pixelmap.tiff,/local/a.svs.boxes.csv,4
+    /remote/b.svs,/remote/b.svs.feature.h5,/remote/b.svs.pixelmap.tiff,/local/b.svs.boxes.csv,4
+
+Feature h5 filenames should follow the pattern [slide_filename].*.feature.h5, but other filenames are unrestricted.
+
+If importing data from DSA mounted storage, provide an identifier for the assetstore where the files are mounted using the -a option. This
+identifier can be determined from the DSA Admin console.
+
+-a, --assetstore  Identifier for storage assetstore if importing files
+-u, --url         URL for server. Defaults to http://localhost:8080/api/v1
+-r, --replace     Replace existing wsis, features, or pixelmaps
+
 Features
 --------
 

diff --git a/data_import.py b/data_import.py
@@ -0,0 +1,275 @@
+import argparse
+from girder_client import GirderClient
+import hashlib
+import os
+from tqdm import tqdm
+
+
+def _import_str(path, file, destination):
+    '''Format the arguments for the assetstore/id/import endpoint to
+       import a single file.'''
+    return dict(
+        importPath=path,
+        destinationId=destination,
+        destinationType='folder',
+        fileIncludeRegex=f'^{file}$',
+        progress=True,
+    )
+
+
+def _import_file(client, assetstore, folder, file, replace=False):
+    '''Import a single file given the assetstore, filename and destination.
+       Optionally replace the item if an item with the same name exists.'''
+    match = list(client.listItem(folder, name=os.path.split(file)[1]))
+    import_args = _import_str(
+        os.path.split(file)[0], os.path.split(file)[1], folder
+    )
+    if len(match) and not replace:
+        return match[0]['_id']
+    elif len(match) and replace:
+        client.delete(f'item/{match[0]["_id"]}')
+    client.post(f'assetstore/{assetstore}/import', import_args)
+    match = list(client.listItem(folder, name=os.path.split(file)[1]))
+    assert len(match) == 1
+    return match[0]['_id']
+
+
+def _upload_file(client, folder, file, replace=False):
+    '''Upload a single file given the assetstore, filename and destination.
+       Optionally replace the item if an item with the same name exists.'''
+    match = list(client.listItem(folder, name=os.path.split(file)[1]))
+    if len(match) and replace:
+        client.delete(f'item/{match[0]["_id"]}')
+        return client.uploadFileToFolder(folder, file)['_id']
+    elif len(match) and not replace:
+        return match[0]['_id']
+    else:
+        return client.uploadFileToFolder(folder, file)['_id']
+
+
+def _feature_h5filename(wsi_id, boxes, patchsize=100):
+    '''Generate h5 feature filename.
+    wsi_id : str
+        The girder identifier of the associated whole slide image.
+    boxes : list of floats
+        Bounding boxes of objects in the order they appear in the feature array.
+        This is a 1D array/list with left, top, right, bottom in sequence for
+        each box at scan magnification.
+    patchsize : int
+        The size of the patches used during feature extraction.
+    '''
+    hashval = repr(
+        dict(itemId=wsi_id, bbox=[int(v) for v in boxes], patchSize=patchsize)
+    )
+    hash = hashlib.new('sha256', hashval.encode()).hexdigest()
+    return f'feature-{hash}.h5'
+
+
+def pixelmap_annotation(pixelmap_id, scale, boxes):
+    '''Generate JSON format pixelmap annotation to attach to whole-slide image.
+    pixelmap_id : string
+        The girder identifier of the pixelmap image (not the whole-slide image).
+    scale : float
+        Scaling ratio between the whole-slide and pixelmap image resolutions. For
+        example, scale for a wsi at 20X and pixelmap of 5x would be 4.
+    '''
+
+    values = [*[0] * (len(boxes) // 4)]
+    categories = [dict(
+        label="default", fillColor="rgba(0, 0, 0, 0)", strokeColor="rgba(0, 0, 0, 1)"
+    )]
+    transform = dict(
+        xoffset=0, yoffset=0, matrix=[[scale, 0], [0, scale]]
+    )
+    pixelmap = dict(
+        type='pixelmap',
+        girderId=pixelmap_id,
+        boundaries=True,
+        transform=transform,
+        values=values,
+        categories=categories,
+        user= {'bbox': boxes}, 
+    )
+    attr = dict(
+        cli = None,
+        metadata = {},
+        params = {"scale_x": scale, "scale_y": scale},
+        version = None
+    )
+    annotation = dict(
+        name='Superpixel Epoch 0',
+        elements=[pixelmap],
+        attributes = attr    
+    )
+    return annotation
+
+
+def guided_label_import(
+    client, collection, wsis, features, pixelmaps, boxes, scales, assetstore=None, replace=False
+):
+    '''Import or upload a guided labeling dataset to a digital slide archive instance.
+
+    :param client: An authenticated GirderClient object.
+    :param collection: Name of the project to create. If a collection with this name exists,
+        it will be used.
+    :param wsis: A list of paths to wsi filenames on local (upload) or mounted (import)
+        storage.
+    :param features: A list of paths to h5 feature files in the same order as `wsis`.
+    :param pixelmaps: A list of paths to tiff pixelmap image files in the same order as
+        `wsis`.
+    :param boxes: A list of 2D arrays containing the left, top, right, and bottom of the
+        bounding box for each object in each pixelmap. Coordinates should be listed at
+        native scan magnification. The order of objects in each 2D array should follow
+        the order of values in the corresponding pixelmap.
+    :param scales: The float ratios of resolutions between the whole-slide images and the
+        corresponding pixelmap images.
+    :param assetstore: The girder id of the assetstore if data will be imported.
+        Default value of `None` means data will be uploaded and that all paths in `wsis`,
+        `features`, and `pixelmaps` are local file paths.
+    :param bool: If True, replace items during import or upload where filenames match.
+    '''
+
+    # if collection does not exist, create it, otherwise get collection id
+    match = client.get('collection', dict(text=collection, limit=0))
+    if len(match):
+        collection = match[0]['_id']
+    else:
+        collection = client.post('collection', dict(name=collection))['_id']
+
+    # construct folders if necessary
+    data_folder = client.loadOrCreateFolder('Data', collection, 'collection')['_id']
+    client.addMetadataToFolder(data_folder, {'active_learning': True})
+    feature_folder = client.loadOrCreateFolder('Features', data_folder, 'folder')['_id']
+    pixelmap_folder = client.loadOrCreateFolder('Annotations', data_folder, 'folder')['_id']
+    client.loadOrCreateFolder('Models', data_folder, 'folder')['_id']
+
+    # import if data assetstore provided, otherwise upload
+    wsi_ids = {}
+    feature_ids = {}
+    pixelmap_ids = {}
+    for w, f, p, b, s in tqdm(
+        zip(wsis, features, pixelmaps, boxes, scales), total=len(wsis),
+        desc='Importing' if assetstore else 'Uploading'
+    ):
+        if assetstore:
+            wsi_ids[w] = _import_file(client, assetstore, data_folder, w, replace)
+            feature_ids[f] = _import_file(client, assetstore, feature_folder, f, replace)
+            pixelmap_ids[p] = _import_file(client, assetstore, pixelmap_folder, p, replace)
+        else:
+            wsi_ids[w] = _upload_file(client, data_folder, w, replace)
+            feature_ids[f] = _upload_file(client, feature_folder, f, replace)
+            pixelmap_ids[f] = _upload_file(client, pixelmap_folder, p, replace)
+
+        # check for existing pixelmap annotation in wsi
+        # generate and post annotation if necessary
+        existing = client.get(f'/annotation/item/{wsi_ids[w]}')
+        document = pixelmap_annotation(
+            pixelmap_ids[p], s, [x for box in b for x in box]
+        )
+        if len(existing) and replace:
+            for annotation in existing:
+                client.delete(f'/annotation/{annotation["_id"]}')
+            client.post(f'/annotation?itemId={wsi_ids[w]}', json=document)
+        else:
+            skip = [
+                element['type'] == 'pixelmap'
+                for annotation in existing
+                for element in annotation['annotation']['elements']
+            ]
+            if not any(skip):
+                client.post(f'/annotation?itemId={wsi_ids[w]}', json=document)
+    return collection
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description=(
+            'Import or upload a guided labeling dataset to a digital slide archive instance.'
+        )
+    )
+    parser.add_argument(
+        'input',
+        type=str,
+        help=(
+            'Comma separated file listing wsi, feature (h5), pixelmap (tiff), and bounding box '
+            '(csv - local) input files.'
+        ),
+    )
+    parser.add_argument(
+        'key',
+        type=str,
+        help=(
+            'API key for the server (see /api/v1#/api_key/api_key_createKey_post_api_key).'
+        ),
+    )
+    parser.add_argument(
+        'collection',
+        type=str,
+        help=(
+            'Name of the created collection / project.'
+        ),
+    )
+    parser.add_argument(
+        '-u',
+        '--url',
+        type=str,
+        default='http://localhost:8080/api/v1',
+        help=(
+            'Optional URL for the DSA API. Efaults to http://localhost:8080/api/v1.'
+        ),
+    )
+    parser.add_argument(
+        '-a',
+        '--assetstore',
+        type=str,
+        help=(
+            'Optional identifier of the assetstore for file import. Defaults to upload (None).'
+        ),
+    )
+    parser.add_argument(
+        '-r',
+        '--replace',
+        dest='replace',
+        action='store_true',
+        help=(
+            'Optional replace existing wsis, features, or pixelmaps. Defaults to no replacement.'
+        ),
+    )
+    args = parser.parse_args()
+
+    # create and authenticate client
+    client = GirderClient(apiUrl=args.url)
+    client.authenticate(apiKey=args.key)
+
+    # parse input to build lists of files
+    inputs = []
+    with open(args.input, 'r') as f:
+        for line in f:
+            wsi, feature, pixelmap, box, scales = line.strip().split(',')
+            inputs.append((wsi, feature, pixelmap, box, scales))
+    wsis = [row[0] for row in inputs]
+    features = [row[1] for row in inputs]
+    pixelmaps = [row[2] for row in inputs]
+    boxes = [row[3] for row in inputs]
+    scales = [float(row[4]) for row in inputs]
+
+    # build list of bounding boxes
+    bounding = []
+    for b in boxes:
+        with open(b, 'r') as f:
+            box = [
+                [int(x) for x in line.strip().split(',')]
+                for line in f
+            ]
+        bounding.append(box)
+
+    # import if assetstore defined, otherwise upload
+    guided_label_import(
+        client, args.collection,
+        wsis, features, pixelmaps, bounding, scales,
+        args.assetstore, args.replace
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/docs/screenshots/pixelmap.png b/docs/screenshots/pixelmap.png