Skip to content
#

data-curation

Here are 160 public repositories matching this topic...

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • Updated Jan 13, 2026
  • Python

fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.

  • Updated Apr 14, 2026
  • Python

PyTorch dataset debugger for computer vision — pause training, mine live loss signals to surface mislabels, class imbalance & outliers, then curate your image, video & LiDAR data without restarting

  • Updated Jul 3, 2026
  • Python

Data Curation Tool is a local-first dataset curation, tagging, model-management, downloader, 3D generation, media metadata, and tool-orchestration application. The current documentation set is stored in docs/wiki/

  • Updated Jul 3, 2026
  • Python

Improve this page

Add a description, image, and links to the data-curation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-curation topic, visit your repo's landing page and select "manage topics."

Learn more