docs: restructure dataset schema page with introduction and guidance#2554
docs: restructure dataset schema page with introduction and guidance#2554jancurn wants to merge 17 commits into
Conversation
Adds a comprehensive "Why use views" section to dataset schema docs that explains the purpose and benefits of views, when to use them, how to organize views by use case, and what views are NOT for. Also includes a practical multi-view example for an e-commerce scraper. This addresses feedback that the documentation explained HOW to configure views but not WHY or WHEN to use them. Slack thread: https://apify.slack.com/archives/C010Q0FBYG3/p1779357816377359?thread_ts=1779115904.940779&cid=C010Q0FBYG3 https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
|
✅ Preview for this PR (commit |
- Shorten the why/when content significantly - Add link to Google Maps Scraper as real-world example - Keep the anti-pattern note (useful guidance) - Remove redundant explanations https://claude.ai/code/session_018Upw3aA9syy5Jm84F1xp9f
Adds clarification that views only affect Console UI display, not how data is exported to JSON, CSV, or other formats. https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
Reorganizes the page to provide better context before diving into details: - Adds introduction explaining what dataset schema is and its two components - Moves file structure section before examples - Reorganizes into clear Fields and Views sections as parallel concepts - Consolidates reference tables at the end - Maintains all existing content but in a more logical flow The page now follows the same pattern as other actor definition pages (input_schema, output_schema) where concepts are introduced before examples. https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
| } | ||
| } | ||
| } | ||
| "fields": { /* JSON Schema describing each item */ }, |
There was a problem hiding this comment.
Instead of these comments, keep there the short examples
Addresses review feedback to show actual field/view examples instead of comments in the schema components overview. https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
TC-MO
left a comment
There was a problem hiding this comment.
Thanks Jan! Left some inline suggestions, mostly:
- Bold reserved for UI elements (a few bullet lists use term as a label pattern)
- "Output tab UI" > "Output tab" for consistency (the tab is in the UI)
- A couple of gerund headings and one all-caps "NOT" to soften
- Small tightening passes where prose restates what's just above
- One technical fix: $schema in the example uses draft-07 but the reference table specifies Draft 2020-12, changed it for consistency's sake
One pattern that's worth applying consistently across the page (not just the bullets at LOC15-16): views is required, fields is optional, so required-first ordering
would apply to:
- The example JSON at LOC20-38 (views before fields)
- The major sections (move Views section above Fields section)
- The reference table at LOC427-428 (views row before fields row)
Happy to make those changes myself if that is easier.
| } | ||
| ``` | ||
|
|
||
| The first view defined becomes the default tab. |
There was a problem hiding this comment.
I would cut this, it is already stated at LOC 213.
| ### Flatten in Actor code | ||
|
|
||
| Alternatively, flatten nested structures in your Actor code before calling `Actor.pushData()`. |
There was a problem hiding this comment.
Single sentence subsections are not the greatest practice. I would recommend removing H3 and folding into previous section.
…chema/index.md Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
…chema/index.md Co-authored-by: Michał Olender <92638966+TC-MO@users.noreply.github.com>
- Remove redundant "UI" from "Output tab UI" throughout - Change "two main components" to "two components" - Remove sentence that rephrases the bullets - Simplify field and AI agent descriptions - Rename "Example with field metadata" to "Field metadata example" - Update JSON Schema to 2020-12 draft for consistency - Reword validation link to start with verb - Remove bold from bullet points (reserved for UI elements) - Change "What views are NOT for" to "What views are not for" - Simplify Google Maps example sentence - Fold single-sentence subsection into previous section - Various prose cleanup for conciseness https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG
Summary
Restructures the dataset schema documentation to provide proper context before diving into details:
fieldsandviews)The page now follows the same pattern as other actor definition pages (input_schema, output_schema).
Context
Based on feedback from Martin Sabo and Jaroslav Hejlek in #dev-docs - the documentation explained HOW to configure views but not WHY or WHEN to use them.
Slack thread: https://apify.slack.com/archives/C010Q0FBYG3/p1779357816377359?thread_ts=1779115904.940779&cid=C010Q0FBYG3
Test plan
https://claude.ai/code/session_01JyTmwWUsZaN7436BBgxwvG