diff --git a/ROADMAP-TO-4.0.md b/ROADMAP-TO-4.0.md index 7506898d..034b1b3b 100644 --- a/ROADMAP-TO-4.0.md +++ b/ROADMAP-TO-4.0.md @@ -22,7 +22,7 @@ The constructor for the `Table` object should take some parameters to specify pr * `.where()`: an iterator for querying with conditions that are evaluated with the internal compute engine. * `.index()` for indexing a column and getting better performance in queries (desirable, but optional for 4.0). -In particular, it should try to mimic much of the functionality of data-querying libraries such as ``pandas`` (see [this blog](https://datapythonista.me/blog/whats-new-in-pandas-3) for much of the followin). Hence, one should be able to filter rows of the `Table` via querying on multiple columns (accessed via `.` or perhaps ``__getitem__``), with conditions to select rows implemented via `.index`, `.where` like so +In particular, it should try to mimic much of the functionality of data-querying libraries such as ``pandas`` (see [this blog](https://datapythonista.me/blog/whats-new-in-pandas-3) for much of the following). Hence, one should be able to filter rows of the `Table` via querying on multiple columns (accessed via `.` or perhaps ``__getitem__``), with conditions to select rows implemented via `.index`, `.where` like so ``` tbl.where((tbl.property_type == "hotel") & (tbl.country == "us")) diff --git a/bench/ctable/extend_vs_apend.py b/bench/ctable/extend_vs_append.py similarity index 100% rename from bench/ctable/extend_vs_apend.py rename to bench/ctable/extend_vs_append.py diff --git a/bench/ctable/row_acces.py b/bench/ctable/row_access.py similarity index 100% rename from bench/ctable/row_acces.py rename to bench/ctable/row_access.py diff --git a/examples/embeded-expr-udf-b2z.py b/examples/embedded-expr-udf-b2z.py similarity index 100% rename from examples/embeded-expr-udf-b2z.py rename to examples/embedded-expr-udf-b2z.py diff --git a/plans/containers-tutorial.md b/plans/containers-tutorial.md index 8927fd44..e9c53e10 100644 --- a/plans/containers-tutorial.md +++ b/plans/containers-tutorial.md @@ -1,23 +1,14 @@ -# Plan: Containers Tutorial +# Containers Tutorial Plan Target notebook: `doc/getting_started/tutorials/13.containers.ipynb` -## Goal +Goal: +Create a small but comprehensive tutorial for the main `python-blosc2` data containers, designed for reading and running as a Jupyter notebook. The style should be concept-first, visual, and practical, similar in spirit to the FFmpeg libav tutorial referenced by the user: short explanations, clear mental models, small runnable examples, and lightweight but expressive figures. -Land a solid v1 tutorial that gives users a practical mental map of the main -`python-blosc2` containers, with short runnable examples and a small number of -supporting diagrams. +## Scope -This tutorial should answer: - -- what each container is -- how the containers relate to each other -- which container to choose for a given workflow - -## Scope For V1 - -Included sections: +The tutorial will cover these main containers, in this order: 1. `SChunk` 2. `NDArray` @@ -28,83 +19,499 @@ Included sections: 7. `TreeStore` 8. `C2Array` -V1 constraints: +Notes: + +- `SChunk` comes first because it is the basis for the higher-level local containers. +- `Batch` is not part of the main list because it is a view returned by `BatchArray`, not a top-level container. +- `Batch` can still be mentioned briefly inside the `BatchArray` section. + +## Proposed Table of Contents + +1. Introduction + Explain what “container” means in `python-blosc2` and what the tutorial covers. + +2. The Big Picture + Present a family overview of the containers and how they relate. + +3. `SChunk`: The Foundation + Explain that it is a sequence of compressed chunks plus metadata. + Show why it is the storage basis for higher-level containers. + +4. `NDArray`: Compressed N-D Arrays + Explain that it builds array semantics on top of an `SChunk`. + Cover slicing, chunking, persistence, and typical array workflows. + +5. `VLArray`: Variable-Length Items + Explain that it stores one serialized variable-length item per entry. + Position it as the ragged/object-like container. + +6. `BatchArray`: Batched Variable-Length Data + Explain that it stores batches in compressed chunks, with optional block-local reads inside each batch. + Position it for batch-oriented ingestion and access. + +7. `EmbedStore`: Bundle Several Containers Into One Store + Explain how it embeds serialized containers/nodes into one backing store. + Position it for portability and packaging. + +8. `DictStore`: Key-Value Collection of Containers + Explain the directory/zip-backed keyed collection model. + Position it for multi-object datasets. + +9. `TreeStore`: Hierarchical Datasets + Explain it as a hierarchical extension of `DictStore`. + Position it for tree-structured datasets. + +10. `C2Array`: Remote Arrays + Explain it as a remote handle over Caterva2/HTTP. + Position it for remote array access without full local copies. + +11. Choosing the Right Container + Provide a compact comparison table across the containers. + +12. Final Notes + Summarize common usage patterns and point to deeper documentation. + +## Per-Section Template + +Each main container section should follow the same pattern: + +1. Short description +2. How it is implemented +3. What features it provides +4. What it is useful for +5. Small runnable code example +6. Small figure + +This repetition should make the notebook predictable and easy to scan. + +## Content Style + +The notebook should aim for: + +- short paragraphs +- direct language +- minimal theory beyond what is needed to build intuition +- small examples that run quickly +- progressive complexity +- visuals that reinforce the mental model rather than decorate the page + +The notebook should avoid: + +- long API reference dumps +- too many parameters in each example +- overly abstract prose +- figures that try to encode too much information at once + +## Image Strategy + +Images should be simple, consistent, and expressive. + +Recommended visual grammar: + +- deep blue outline: container +- dark yellow blocks: compressed chunks / payload pieces +- light blue strip: metadata +- dashed arrows: references or remote access +- folder/zip shapes only for store-like containers + +Preferred palette, based on the Blosc2 logo: + +- dark yellow: `#df9e00` +- light blue: `#007a86` +- deep blue: `#002a64` + +Suggested mapping: + +- deep blue (`#002a64`) for container outlines, titles, and main structural elements +- dark yellow (`#df9e00`) for chunk payload blocks and highlighted storage pieces +- light blue (`#007a86`) for metadata bands, secondary structure, and remote/reference accents + +The figures do not need to use only these colors, but these three should define the visual identity of the container diagrams. + +Recommended first-pass figure list: + +1. Overview map + Show the relationships among `SChunk`, `NDArray`, `VLArray`, `BatchArray`, `EmbedStore`, `DictStore`, `TreeStore`, and `C2Array`. + +2. `SChunk` + Show a sequence of compressed chunks plus metadata. + +3. `NDArray` + Show array semantics on top of chunked compressed storage. + +4. `VLArray` vs `BatchArray` + Side-by-side comparison: + `VLArray` as one variable-sized item per entry; + `BatchArray` as one chunk per batch with internal subdivision. + +5. `EmbedStore` / `DictStore` / `TreeStore` + Show the progression from embedded bundle to key-value store to hierarchical tree. + +This should be enough to make the notebook visual without overproducing assets. + +## Asset Format + +Preferred format: SVG + +Reasons: + +- crisp rendering in notebooks +- easy to version-control +- easy to tweak +- lightweight and portable + +Suggested asset location: + +- `doc/getting_started/tutorials/images/containers/overview.svg` +- `doc/getting_started/tutorials/images/containers/schunk.svg` +- `doc/getting_started/tutorials/images/containers/ndarray.svg` +- `doc/getting_started/tutorials/images/containers/vlarray-batcharray.svg` +- `doc/getting_started/tutorials/images/containers/stores.svg` + +## Collaboration Workflow For Images + +Proposed workflow: + +1. Draft a one-line image spec for each figure. +2. Review the metaphor and emphasis with the user. +3. Produce a simple SVG draft. +4. Review for clarity first, polish second. +5. Revise if the figure is visually clean but conceptually ambiguous. + +The goal is not artistic polish. The goal is instant comprehension. + +## Suggested Notebook Flow + +The notebook itself should likely be built from cells in this order: + +1. Title and short intro +2. Overview diagram +3. One short “family map” section +4. One section per container +5. Comparison table +6. Closing notes + +For each container section: + +- markdown cell with description and use cases +- markdown cell or callout with implementation notes +- code cell with tiny example +- markdown cell with figure + +## Small Example Guidelines + +Examples should be: + +- short enough to fit in one notebook cell +- independent where possible +- fast to run +- focused on one idea + +Examples should demonstrate: + +- `SChunk`: append/get/decompress or basic chunk operations +- `NDArray`: create, persist, slice +- `VLArray`: append and retrieve variable-length items +- `BatchArray`: append a batch, iterate batches or items +- `EmbedStore`: put/get a couple of nodes +- `DictStore`: assign named entries +- `TreeStore`: assign hierarchical paths and traverse +- `C2Array`: open remote metadata and retrieve a small slice + +For `C2Array`, the example may need extra care because it depends on remote access; if needed, keep it lightweight and make clear that it requires network access. + +## Open Decisions For Next Iteration + +These should be settled next: + +1. Exact section titles and tone of the notebook. +2. The first-pass image specs, one by one. +3. Whether all code examples should be executable offline except `C2Array`. +4. Whether to include one summary table near the top, near the end, or both. +5. Whether to add one “common patterns” section showing how containers compose. + +## First-Pass SVG Image Specs + +These specs are meant to be simple enough to implement quickly, but concrete enough that the figures will already be useful in a first draft. + +### 1. `overview.svg` + +Purpose: + +- Give the reader a fast mental map of the container family. + +Core message: + +- `SChunk` is the storage foundation. +- `NDArray`, `VLArray`, and `BatchArray` build on top of it. +- `EmbedStore`, `DictStore`, and `TreeStore` organize multiple containers. +- `C2Array` is the remote-facing member of the family. + +Suggested layout: + +- One central `SChunk` box. +- Three boxes above or to the right: `NDArray`, `VLArray`, `BatchArray`. +- Three store boxes further out: `EmbedStore`, `DictStore`, `TreeStore`. +- One separate remote box: `C2Array`. +- Solid arrows from `SChunk` to `NDArray`, `VLArray`, `BatchArray`. +- Solid arrow from `DictStore` to `TreeStore`. +- Dashed arrow between stores and `C2Array` to indicate references/remote links. + +Suggested labels inside boxes: + +- `SChunk`: “compressed chunks + metadata” +- `NDArray`: “N-D array semantics” +- `VLArray`: “one variable-length item per entry” +- `BatchArray`: “one batch per chunk” +- `EmbedStore`: “embedded nodes” +- `DictStore`: “named collection” +- `TreeStore`: “hierarchical collection” +- `C2Array`: “remote array handle” + +Visual note: + +- This should be the least detailed figure, optimized for orientation. + +### 2. `schunk.svg` + +Purpose: + +- Show what an `SChunk` physically/conceptually looks like. + +Core message: + +- `SChunk` is a sequence of compressed chunks plus metadata. + +Suggested layout: + +- One large horizontal container box labeled `SChunk`. +- Green strip at the top or left labeled `meta / vlmeta`. +- Several orange rectangular blocks inside labeled `chunk 0`, `chunk 1`, `chunk 2`, `...`. +- Optional small caption under the box: “persistent or in-memory”. + +Suggested callouts: + +- “append/update/delete chunks” +- “compressed payload” +- “basis for higher-level containers” + +Visual note: + +- This should be the simplest figure of the set. + +### 3. `ndarray.svg` + +Purpose: + +- Show that `NDArray` is an array interface over chunked compressed storage. + +Core message: + +- `NDArray` provides shape/dtype/slicing semantics on top of an `SChunk`. + +Suggested layout: + +- Top layer: a blue `NDArray` box with small labels: + `shape`, `dtype`, `chunks`, `blocks` +- Under it: a simplified grid or 1D strip labeled “logical array view”. +- Under that: an `SChunk` box with orange chunk blocks. +- Arrow from `NDArray` to `SChunk`. + +Suggested callouts: + +- “array semantics” +- “slicing” +- “persistent `.b2nd` or in-memory” + +Visual note: + +- Show the distinction between logical array view and physical chunk storage. + +### 4. `vlarray.svg` + +Purpose: + +- Show how `VLArray` differs from `NDArray` and why it fits variable-length values. + +Core message: + +- One logical entry maps to one independently compressed serialized payload. + +Suggested layout: + +- Left side: a vertical list labeled `VLArray` with entries like: + `{"a": 1}` + `"hello"` + `[1, 2, 3, 4]` + `b"..."` +- Right side: an `SChunk` with orange blocks of visibly different lengths. +- Arrows from each logical entry to one chunk block. + +Suggested callouts: + +- “serialize” +- “compress” +- “independent entries” + +Visual note: + +- Different block widths are important here to visually reinforce variable-length storage. + +### 5. `vlarray-batcharray.svg` + +Purpose: + +- Compare `VLArray` and `BatchArray` directly. + +Core message: + +- `VLArray`: one item per chunk. +- `BatchArray`: one batch per chunk, possibly subdivided internally. + +Suggested layout: + +- Two panels side by side. + +Left panel: + +- `VLArray` +- Three logical entries, each mapping to one orange block. + +Right panel: + +- `BatchArray` +- Three logical batches, each mapping to one larger orange block. +- Inside each batch block, draw smaller subdivisions to suggest internal blocks/items. + +Suggested callouts: + +- left: “fine-grained item storage” +- right: “batch-oriented storage” + +Visual note: + +- This should make the contrast obvious at a glance. + +### 6. `embedstore.svg` + +Purpose: + +- Show that `EmbedStore` bundles different nodes into one backing store. + +Core message: + +- Multiple containers are embedded into one portable store. + +Suggested layout: + +- One big blue outer box labeled `EmbedStore`. +- Inside it: + one small `NDArray` node, + one `SChunk` node, + one `VLArray` or `BatchArray` node, + one dashed-link node for `C2Array` reference. +- A green map/index strip on one side labeled `key -> offset/length`. + +Suggested callouts: + +- “single bundled store” +- “embedded serialized nodes” +- “remote references possible” + +Visual note: + +- Keep it compact; this image is about bundling, not hierarchy. + +### 7. `dictstore.svg` + +Purpose: + +- Show `DictStore` as a named collection with embedded and external leaves. + +Core message: + +- `DictStore` organizes multiple named objects in a directory/zip-like structure. + +Suggested layout: + +- Folder or zip-shaped outer boundary labeled `DictStore`. +- Inside: + one embedded file-like box labeled `embed.b2e`, + a few external leaves with names like `a.b2nd`, `b.b2b`, `c.b2f`. +- A short list of sample keys on the left: + `/a`, `/b`, `/c` + +Suggested callouts: + +- “named collection” +- “embedded + external storage” +- “`.b2d` / `.b2z`” + +Visual note: + +- This figure should emphasize storage organization rather than data layout. + +### 8. `stores.svg` + +Purpose: + +- Show the progression from `EmbedStore` to `DictStore` to `TreeStore`. + +Core message: -- every local-container section gets a short runnable example -- `C2Array` remains offline-safe by default and does not fetch remote data -- only a small number of diagrams are required -- the notebook must be linked from `doc/getting_started/tutorials.rst` +- The stores differ mainly in how they organize multiple objects. -## Current Status +Suggested layout: -- `13.containers.ipynb` now has runnable v1 content for the local containers -- `overview.svg` is now present under `doc/getting_started/tutorials/images/containers/` -- `doc/getting_started/tutorials.rst` now indexes tutorials `12`, `13`, and `14` -- the notebook keeps `C2Array` offline-safe by default +- Three panels left-to-right: + `EmbedStore` -> `DictStore` -> `TreeStore` +- `EmbedStore`: simple bundle +- `DictStore`: flat named collection +- `TreeStore`: hierarchical `/group/subgroup/node` -## Implementation Plan +Suggested callouts: -### Phase 1: Make The Tutorial Landable +- `EmbedStore`: “bundle” +- `DictStore`: “flat keys” +- `TreeStore`: “hierarchical keys” -- add `plans/containers-tutorial.md` as the live plan and progress record -- index `12.batcharray`, `13.containers`, and `14.indexing-arrays` in - `doc/getting_started/tutorials.rst` -- keep the tutorial focused on current APIs only +Visual note: -### Phase 2: Replace Placeholder Cells +- This should help readers understand why both `DictStore` and `TreeStore` exist. -- add a shared setup cell for imports, temp paths, and cleanup helpers -- replace every `TODO` code cell with a compact runnable example -- use local temp paths so repeated notebook runs stay deterministic +### 9. `c2array.svg` -Planned examples: +Purpose: -- `SChunk`: create, append chunks, inspect chunk counts and metadata -- `NDArray`: create, persist, slice, reopen -- `VLArray`: append variable-length values, inspect entries, reopen -- `BatchArray`: append batches, inspect per-batch and item-level access -- `EmbedStore`: bundle a couple of objects in one container -- `DictStore`: store named leaves and reopen them -- `TreeStore`: create a small hierarchy and walk a subtree -- `C2Array`: show the URLPath/open pattern with remote access disabled by default +- Show `C2Array` as a remote array handle. -### Phase 3: Trim The Visual Scope +Core message: -- keep `overview.svg` as the main family diagram -- add one lightweight store-oriented diagram if needed -- remove per-section placeholder figure text that would otherwise make the - notebook feel unfinished +- `C2Array` does not own local storage in the same way; it points to remote array data and fetches metadata/slices on demand. -### Phase 4: Verify +Suggested layout: -- run the notebook or equivalent code path checks locally -- confirm that all image paths resolve -- confirm that the tutorial appears in the rendered docs index +- Left: local client box labeled `C2Array`. +- Middle: dashed network arrow labeled `HTTP`. +- Right: remote service/cloud box containing a remote array rectangle. +- Optional small metadata card near the client: + `shape`, `dtype`, `chunks` -## Benefits +Suggested callouts: -This tutorial is worth continuing because it fills a real gap: +- “remote metadata” +- “remote slice fetch” +- “Caterva2-backed” -- current docs have single-feature tutorials, but not a family overview -- users can see how `SChunk`, array containers, and store containers fit together -- the comparison section helps users choose the right container earlier -- it should reduce confusion around when to use `VLArray`, `BatchArray`, - `EmbedStore`, `DictStore`, or `TreeStore` +Visual note: -## Progress Log +- Keep this visually distinct from the local-storage figures. -### 2026-04-12 +## Recommended Next Steps -- decided to keep a narrow v1 scope -- decided to favor runnable examples over a large diagram set -- decided to keep `C2Array` offline-safe by default -- completed: - - added `plans/containers-tutorial.md` - - indexed `12.batcharray`, `13.containers`, and `14.indexing-arrays` - - replaced placeholder cells in `13.containers.ipynb` with runnable examples - - added `images/containers/overview.svg` - - added a reduced-scope asset README -- verification: - - all notebook code cells executed successfully in a direct Python pass - - `jupyter nbconvert --execute` could not be used in the sandbox because the - Jupyter kernel could not bind local ports +1. Finalize the exact notebook outline and section titles. +2. Review and refine these image specs. +3. Create the notebook skeleton with markdown headings and placeholder figure cells. +4. Fill in the runnable examples. +5. Add the SVG assets. +6. Refine the narrative and transitions between sections. diff --git a/plans/containters-tutorial.md b/plans/containters-tutorial.md deleted file mode 100644 index e9c53e10..00000000 --- a/plans/containters-tutorial.md +++ /dev/null @@ -1,517 +0,0 @@ -# Containers Tutorial Plan - -Target notebook: -`doc/getting_started/tutorials/13.containers.ipynb` - -Goal: -Create a small but comprehensive tutorial for the main `python-blosc2` data containers, designed for reading and running as a Jupyter notebook. The style should be concept-first, visual, and practical, similar in spirit to the FFmpeg libav tutorial referenced by the user: short explanations, clear mental models, small runnable examples, and lightweight but expressive figures. - -## Scope - -The tutorial will cover these main containers, in this order: - -1. `SChunk` -2. `NDArray` -3. `VLArray` -4. `BatchArray` -5. `EmbedStore` -6. `DictStore` -7. `TreeStore` -8. `C2Array` - -Notes: - -- `SChunk` comes first because it is the basis for the higher-level local containers. -- `Batch` is not part of the main list because it is a view returned by `BatchArray`, not a top-level container. -- `Batch` can still be mentioned briefly inside the `BatchArray` section. - -## Proposed Table of Contents - -1. Introduction - Explain what “container” means in `python-blosc2` and what the tutorial covers. - -2. The Big Picture - Present a family overview of the containers and how they relate. - -3. `SChunk`: The Foundation - Explain that it is a sequence of compressed chunks plus metadata. - Show why it is the storage basis for higher-level containers. - -4. `NDArray`: Compressed N-D Arrays - Explain that it builds array semantics on top of an `SChunk`. - Cover slicing, chunking, persistence, and typical array workflows. - -5. `VLArray`: Variable-Length Items - Explain that it stores one serialized variable-length item per entry. - Position it as the ragged/object-like container. - -6. `BatchArray`: Batched Variable-Length Data - Explain that it stores batches in compressed chunks, with optional block-local reads inside each batch. - Position it for batch-oriented ingestion and access. - -7. `EmbedStore`: Bundle Several Containers Into One Store - Explain how it embeds serialized containers/nodes into one backing store. - Position it for portability and packaging. - -8. `DictStore`: Key-Value Collection of Containers - Explain the directory/zip-backed keyed collection model. - Position it for multi-object datasets. - -9. `TreeStore`: Hierarchical Datasets - Explain it as a hierarchical extension of `DictStore`. - Position it for tree-structured datasets. - -10. `C2Array`: Remote Arrays - Explain it as a remote handle over Caterva2/HTTP. - Position it for remote array access without full local copies. - -11. Choosing the Right Container - Provide a compact comparison table across the containers. - -12. Final Notes - Summarize common usage patterns and point to deeper documentation. - -## Per-Section Template - -Each main container section should follow the same pattern: - -1. Short description -2. How it is implemented -3. What features it provides -4. What it is useful for -5. Small runnable code example -6. Small figure - -This repetition should make the notebook predictable and easy to scan. - -## Content Style - -The notebook should aim for: - -- short paragraphs -- direct language -- minimal theory beyond what is needed to build intuition -- small examples that run quickly -- progressive complexity -- visuals that reinforce the mental model rather than decorate the page - -The notebook should avoid: - -- long API reference dumps -- too many parameters in each example -- overly abstract prose -- figures that try to encode too much information at once - -## Image Strategy - -Images should be simple, consistent, and expressive. - -Recommended visual grammar: - -- deep blue outline: container -- dark yellow blocks: compressed chunks / payload pieces -- light blue strip: metadata -- dashed arrows: references or remote access -- folder/zip shapes only for store-like containers - -Preferred palette, based on the Blosc2 logo: - -- dark yellow: `#df9e00` -- light blue: `#007a86` -- deep blue: `#002a64` - -Suggested mapping: - -- deep blue (`#002a64`) for container outlines, titles, and main structural elements -- dark yellow (`#df9e00`) for chunk payload blocks and highlighted storage pieces -- light blue (`#007a86`) for metadata bands, secondary structure, and remote/reference accents - -The figures do not need to use only these colors, but these three should define the visual identity of the container diagrams. - -Recommended first-pass figure list: - -1. Overview map - Show the relationships among `SChunk`, `NDArray`, `VLArray`, `BatchArray`, `EmbedStore`, `DictStore`, `TreeStore`, and `C2Array`. - -2. `SChunk` - Show a sequence of compressed chunks plus metadata. - -3. `NDArray` - Show array semantics on top of chunked compressed storage. - -4. `VLArray` vs `BatchArray` - Side-by-side comparison: - `VLArray` as one variable-sized item per entry; - `BatchArray` as one chunk per batch with internal subdivision. - -5. `EmbedStore` / `DictStore` / `TreeStore` - Show the progression from embedded bundle to key-value store to hierarchical tree. - -This should be enough to make the notebook visual without overproducing assets. - -## Asset Format - -Preferred format: SVG - -Reasons: - -- crisp rendering in notebooks -- easy to version-control -- easy to tweak -- lightweight and portable - -Suggested asset location: - -- `doc/getting_started/tutorials/images/containers/overview.svg` -- `doc/getting_started/tutorials/images/containers/schunk.svg` -- `doc/getting_started/tutorials/images/containers/ndarray.svg` -- `doc/getting_started/tutorials/images/containers/vlarray-batcharray.svg` -- `doc/getting_started/tutorials/images/containers/stores.svg` - -## Collaboration Workflow For Images - -Proposed workflow: - -1. Draft a one-line image spec for each figure. -2. Review the metaphor and emphasis with the user. -3. Produce a simple SVG draft. -4. Review for clarity first, polish second. -5. Revise if the figure is visually clean but conceptually ambiguous. - -The goal is not artistic polish. The goal is instant comprehension. - -## Suggested Notebook Flow - -The notebook itself should likely be built from cells in this order: - -1. Title and short intro -2. Overview diagram -3. One short “family map” section -4. One section per container -5. Comparison table -6. Closing notes - -For each container section: - -- markdown cell with description and use cases -- markdown cell or callout with implementation notes -- code cell with tiny example -- markdown cell with figure - -## Small Example Guidelines - -Examples should be: - -- short enough to fit in one notebook cell -- independent where possible -- fast to run -- focused on one idea - -Examples should demonstrate: - -- `SChunk`: append/get/decompress or basic chunk operations -- `NDArray`: create, persist, slice -- `VLArray`: append and retrieve variable-length items -- `BatchArray`: append a batch, iterate batches or items -- `EmbedStore`: put/get a couple of nodes -- `DictStore`: assign named entries -- `TreeStore`: assign hierarchical paths and traverse -- `C2Array`: open remote metadata and retrieve a small slice - -For `C2Array`, the example may need extra care because it depends on remote access; if needed, keep it lightweight and make clear that it requires network access. - -## Open Decisions For Next Iteration - -These should be settled next: - -1. Exact section titles and tone of the notebook. -2. The first-pass image specs, one by one. -3. Whether all code examples should be executable offline except `C2Array`. -4. Whether to include one summary table near the top, near the end, or both. -5. Whether to add one “common patterns” section showing how containers compose. - -## First-Pass SVG Image Specs - -These specs are meant to be simple enough to implement quickly, but concrete enough that the figures will already be useful in a first draft. - -### 1. `overview.svg` - -Purpose: - -- Give the reader a fast mental map of the container family. - -Core message: - -- `SChunk` is the storage foundation. -- `NDArray`, `VLArray`, and `BatchArray` build on top of it. -- `EmbedStore`, `DictStore`, and `TreeStore` organize multiple containers. -- `C2Array` is the remote-facing member of the family. - -Suggested layout: - -- One central `SChunk` box. -- Three boxes above or to the right: `NDArray`, `VLArray`, `BatchArray`. -- Three store boxes further out: `EmbedStore`, `DictStore`, `TreeStore`. -- One separate remote box: `C2Array`. -- Solid arrows from `SChunk` to `NDArray`, `VLArray`, `BatchArray`. -- Solid arrow from `DictStore` to `TreeStore`. -- Dashed arrow between stores and `C2Array` to indicate references/remote links. - -Suggested labels inside boxes: - -- `SChunk`: “compressed chunks + metadata” -- `NDArray`: “N-D array semantics” -- `VLArray`: “one variable-length item per entry” -- `BatchArray`: “one batch per chunk” -- `EmbedStore`: “embedded nodes” -- `DictStore`: “named collection” -- `TreeStore`: “hierarchical collection” -- `C2Array`: “remote array handle” - -Visual note: - -- This should be the least detailed figure, optimized for orientation. - -### 2. `schunk.svg` - -Purpose: - -- Show what an `SChunk` physically/conceptually looks like. - -Core message: - -- `SChunk` is a sequence of compressed chunks plus metadata. - -Suggested layout: - -- One large horizontal container box labeled `SChunk`. -- Green strip at the top or left labeled `meta / vlmeta`. -- Several orange rectangular blocks inside labeled `chunk 0`, `chunk 1`, `chunk 2`, `...`. -- Optional small caption under the box: “persistent or in-memory”. - -Suggested callouts: - -- “append/update/delete chunks” -- “compressed payload” -- “basis for higher-level containers” - -Visual note: - -- This should be the simplest figure of the set. - -### 3. `ndarray.svg` - -Purpose: - -- Show that `NDArray` is an array interface over chunked compressed storage. - -Core message: - -- `NDArray` provides shape/dtype/slicing semantics on top of an `SChunk`. - -Suggested layout: - -- Top layer: a blue `NDArray` box with small labels: - `shape`, `dtype`, `chunks`, `blocks` -- Under it: a simplified grid or 1D strip labeled “logical array view”. -- Under that: an `SChunk` box with orange chunk blocks. -- Arrow from `NDArray` to `SChunk`. - -Suggested callouts: - -- “array semantics” -- “slicing” -- “persistent `.b2nd` or in-memory” - -Visual note: - -- Show the distinction between logical array view and physical chunk storage. - -### 4. `vlarray.svg` - -Purpose: - -- Show how `VLArray` differs from `NDArray` and why it fits variable-length values. - -Core message: - -- One logical entry maps to one independently compressed serialized payload. - -Suggested layout: - -- Left side: a vertical list labeled `VLArray` with entries like: - `{"a": 1}` - `"hello"` - `[1, 2, 3, 4]` - `b"..."` -- Right side: an `SChunk` with orange blocks of visibly different lengths. -- Arrows from each logical entry to one chunk block. - -Suggested callouts: - -- “serialize” -- “compress” -- “independent entries” - -Visual note: - -- Different block widths are important here to visually reinforce variable-length storage. - -### 5. `vlarray-batcharray.svg` - -Purpose: - -- Compare `VLArray` and `BatchArray` directly. - -Core message: - -- `VLArray`: one item per chunk. -- `BatchArray`: one batch per chunk, possibly subdivided internally. - -Suggested layout: - -- Two panels side by side. - -Left panel: - -- `VLArray` -- Three logical entries, each mapping to one orange block. - -Right panel: - -- `BatchArray` -- Three logical batches, each mapping to one larger orange block. -- Inside each batch block, draw smaller subdivisions to suggest internal blocks/items. - -Suggested callouts: - -- left: “fine-grained item storage” -- right: “batch-oriented storage” - -Visual note: - -- This should make the contrast obvious at a glance. - -### 6. `embedstore.svg` - -Purpose: - -- Show that `EmbedStore` bundles different nodes into one backing store. - -Core message: - -- Multiple containers are embedded into one portable store. - -Suggested layout: - -- One big blue outer box labeled `EmbedStore`. -- Inside it: - one small `NDArray` node, - one `SChunk` node, - one `VLArray` or `BatchArray` node, - one dashed-link node for `C2Array` reference. -- A green map/index strip on one side labeled `key -> offset/length`. - -Suggested callouts: - -- “single bundled store” -- “embedded serialized nodes” -- “remote references possible” - -Visual note: - -- Keep it compact; this image is about bundling, not hierarchy. - -### 7. `dictstore.svg` - -Purpose: - -- Show `DictStore` as a named collection with embedded and external leaves. - -Core message: - -- `DictStore` organizes multiple named objects in a directory/zip-like structure. - -Suggested layout: - -- Folder or zip-shaped outer boundary labeled `DictStore`. -- Inside: - one embedded file-like box labeled `embed.b2e`, - a few external leaves with names like `a.b2nd`, `b.b2b`, `c.b2f`. -- A short list of sample keys on the left: - `/a`, `/b`, `/c` - -Suggested callouts: - -- “named collection” -- “embedded + external storage” -- “`.b2d` / `.b2z`” - -Visual note: - -- This figure should emphasize storage organization rather than data layout. - -### 8. `stores.svg` - -Purpose: - -- Show the progression from `EmbedStore` to `DictStore` to `TreeStore`. - -Core message: - -- The stores differ mainly in how they organize multiple objects. - -Suggested layout: - -- Three panels left-to-right: - `EmbedStore` -> `DictStore` -> `TreeStore` -- `EmbedStore`: simple bundle -- `DictStore`: flat named collection -- `TreeStore`: hierarchical `/group/subgroup/node` - -Suggested callouts: - -- `EmbedStore`: “bundle” -- `DictStore`: “flat keys” -- `TreeStore`: “hierarchical keys” - -Visual note: - -- This should help readers understand why both `DictStore` and `TreeStore` exist. - -### 9. `c2array.svg` - -Purpose: - -- Show `C2Array` as a remote array handle. - -Core message: - -- `C2Array` does not own local storage in the same way; it points to remote array data and fetches metadata/slices on demand. - -Suggested layout: - -- Left: local client box labeled `C2Array`. -- Middle: dashed network arrow labeled `HTTP`. -- Right: remote service/cloud box containing a remote array rectangle. -- Optional small metadata card near the client: - `shape`, `dtype`, `chunks` - -Suggested callouts: - -- “remote metadata” -- “remote slice fetch” -- “Caterva2-backed” - -Visual note: - -- Keep this visually distinct from the local-storage figures. - -## Recommended Next Steps - -1. Finalize the exact notebook outline and section titles. -2. Review and refine these image specs. -3. Create the notebook skeleton with markdown headings and placeholder figure cells. -4. Fill in the runnable examples. -5. Add the SVG assets. -6. Refine the narrative and transitions between sections. diff --git a/tests/ndarray/test_linalg.py b/tests/ndarray/test_linalg.py index 9df3a23b..4ea435c4 100644 --- a/tests/ndarray/test_linalg.py +++ b/tests/ndarray/test_linalg.py @@ -1008,7 +1008,7 @@ def shape_chunks_blocks_4d(request): np.complex128(2 - 4j), # NumPy complex128 }, ) -def test_tranpose_scalars(scalar): +def test_transpose_scalars(scalar): scalar_t = blosc2.permute_dims(scalar) np_scalar_t = np.transpose(scalar) np.testing.assert_allclose(scalar_t, np_scalar_t) @@ -1154,7 +1154,7 @@ def test_T_raises(shape): _ = arr.T -def test_tranpose_disk(): +def test_transpose_disk(): a = blosc2.linspace(0, 1, shape=(3, 4), urlpath="a_test.b2nd", mode="w") c = blosc2.permute_dims(a, urlpath="c_test.b2nd", mode="w") diff --git a/tests/ndarray/test_reductions.py b/tests/ndarray/test_reductions.py index c5ece156..bd6c1a18 100644 --- a/tests/ndarray/test_reductions.py +++ b/tests/ndarray/test_reductions.py @@ -275,8 +275,8 @@ def test_broadcast_params(axis, keepdims, reduce_op, shapes): res = expr2 - getattr(expr1, reduce_op)(**reduce_args) oploc = "npcumsum" if reduce_op == "cumulative_sum" else "npcumprod" expr = f"na2 * na3 + 1 - {oploc}(na1 + na2 - na3, axis={axis}" - include_inital = reduce_args.get("include_initial", False) - expr += f", include_initial={keepdims})" if include_inital else ")" + include_initial = reduce_args.get("include_initial", False) + expr += f", include_initial={keepdims})" if include_initial else ")" else: res = expr1 - getattr(expr2, reduce_op)(**reduce_args) expr = f"na1 + na2 - na3 - (na2 * na3 + 1).{reduce_op}(axis={axis}, keepdims={keepdims})"