Skip to content

Update HF dataset cards#239

Merged
prrao87 merged 2 commits into
mainfrom
fix/update-hf-cards
May 15, 2026
Merged

Update HF dataset cards#239
prrao87 merged 2 commits into
mainfrom
fix/update-hf-cards

Conversation

@prrao87
Copy link
Copy Markdown
Contributor

@prrao87 prrao87 commented May 15, 2026

Rewrites every published dataset card for Hugging in this repo to a canonical six-section body: Search, Curate, Evolve, Train, Versioning, Materialize a subset.

What changes in each card

  • Consistent top-of-card spine. YAML frontmatter → title + summary → key features → splits → schema → pre-built indices → "Why Lance?" → loaders (datasets.load_dataset → LanceDB → pylance) → local-download tip.
  • Single LanceDB docs link in the LanceDB loader intro — exactly once per card, never repeated.
  • One pylance entry-point snippet at the top of every card; the six body sections all use LanceDB.
  • Materialization discipline. No unbounded to_table / to_pandas / to_arrow / to_pylist. In-memory terminals are bounded by .limit(k).to_list(). .to_batches() is reserved for the Materialize-a-subset section, piping a filtered scan straight into a new local table.
  • Hub paths in read-only sections, local paths in mutating sections — Search, Curate, Versioning lookups all run against hf://datasets/...; Evolve, tag creation, and local-index builds run against .//data with a one-line pointer back to the download tip.
  • SQL-first idioms for filters (.where("...")) and column derivation (add_columns({"col": "SQL_EXPRESSION"})).
  • Real column names throughout, so every snippet is runnable copy-paste against the published dataset.
  • Modality-aware Train sections. Image, image+text, annotation, and text-only cards use the standard Permutation.identity(tbl).select_columns([...]) pattern wrapped in a PyTorch DataLoader. Video and audio cards frame Train as a pre-extract-once / train-on-derived-features workflow, with a note that the inline blob column earns its place around the training process (inspection, sampling, evaluation, the pre-extraction pass) rather than on the per-batch hot path.

@mintlify
Copy link
Copy Markdown
Contributor

mintlify Bot commented May 15, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
lancedb-bcbb4faf 🟢 Ready View Preview May 15, 2026, 6:50 PM

@prrao87 prrao87 merged commit a47336f into main May 15, 2026
2 checks passed
@prrao87 prrao87 deleted the fix/update-hf-cards branch May 15, 2026 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant