Skip to content

Create a torch dataset out of a selection of samples #5

@andandandand

Description

@andandandand

Goal

Enable selection of samples in the HyperView interface, and provide an option to export the selected samples as a Torch-compatible dataset. This will support both prototyping and downstream ML workflows.

Requirements

  • User should be able to select samples from the data view.
  • Provide export functionality (button/menu) for selected samples.
  • Exported dataset should be in a format readily usable by PyTorch (torch.utils.data.Dataset).
  • Document the data schema and any requirements for serialization (e.g., images, labels, metadata).
  • Ensure compatibility with common ML data loading operations (e.g., batching, transforms).
  • Example usage should be part of the documentation.

Suggested Implementation Steps

  1. Add a selection mechanism to the sample view (e.g., checkboxes, multi-select).
  2. Implement an export option in the UI for the selected samples.
  3. On export, package the selected samples into a Torch-compatible dataset object, and serialize it (e.g., as .pt or a folder structure).
  4. Provide sample code (Python) for loading/exporting the dataset and for a minimal training loop using the exported dataset.

Documentation

  • Add documentation to the repo on the selection/export workflow.
  • Include code snippets for using the exported dataset in PyTorch.

Acceptance Criteria

  • Users can select samples and export them as a Torch dataset.
  • Exported dataset is loadable using PyTorch with correct metadata.
  • Documentation and example code are available.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions