From 292257856d1be95f86c5dfface77796a3337ca61 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Adam=20Zieli=C5=84ski?= Date: Sun, 3 May 2026 23:15:10 +0200 Subject: [PATCH] docs: components//README.md is the catalog source MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Consolidates the catalog's per-component markdown into each component's README.md and retires bin/_docs_components/ entirely. Before: components//README.md — Composer/Packagist surface bin/_docs_components/.md — docs-site catalog bin/_extract_catalog.py — one-shot migration tool After: components//README.md — both Composer/Packagist surface AND docs-site catalog. Frontmatter at the top (slug/title/install/credit_*/ see_also) drives the docs site; GitHub's renderer hides it from the README view on github.com. Why: the two files had overlapping content (lede, intro prose) and nothing structural prevented drift. Co-locating the catalog with the component eliminates the duplication, removes the bin/_docs_components/ indirection, and keeps everything one Composer package needs in one file. Pipeline change is small: - bin/_load_catalog.py: COMPONENT_DIR replaced by COMPONENT_ORDER (slug → directory tuples, kept ordered) reading from components//README.md. - bin/run-snippets.py: --update writes captured stdout back into the same README via the same slug → directory map. - bin/build-reference.py: docstring only. - .github/workflows/{docs,snippet-tests}.yml: dropped `bin/_docs_components/**` from path filters. - bin/_docs_components/ and bin/_extract_catalog.py deleted. - bin/_docs_components.py keeps its small structural metadata (STARTER_PATHS, COMPONENT_GUIDES) plus `COMPONENTS = load_components()`. Verified: bin/run-snippets.py --check → 87/87 still pass; build-reference.py regenerates docs/reference/ cleanly. Note: replacing the old per-package READMEs with the catalog content drops some "API Reference" tables that lived only in the old READMEs (Polyfill's function list, ByteStream's interfaces, Merge's strategies, HttpServer's class list, etc.). Most of those were redundant with the new "When to use which" tables already in the catalog; if any are worth keeping, they can be added back as new sections in the README in a follow-up — they'll then appear on the docs site too. --- .github/workflows/docs.yml | 1 - .github/workflows/snippet-tests.yml | 3 +- .gitignore | 1 + bin/_docs_components.py | 10 +- bin/_docs_components/README.md | 131 ---- bin/_docs_components/_order.txt | 21 - bin/_docs_components/blockparser.md | 301 --------- bin/_docs_components/blueprints.md | 200 ------ bin/_docs_components/bytestream.md | 203 ------ bin/_docs_components/cli.md | 232 ------- bin/_docs_components/coding-standards.md | 66 -- bin/_docs_components/corsproxy.md | 151 ----- bin/_docs_components/dataliberation.md | 282 -------- bin/_docs_components/encoding.md | 196 ------ bin/_docs_components/filesystem.md | 260 -------- bin/_docs_components/git.md | 273 -------- bin/_docs_components/html.md | 414 ------------ bin/_docs_components/httpclient.md | 606 ----------------- bin/_docs_components/httpserver.md | 138 ---- bin/_docs_components/markdown.md | 224 ------- bin/_docs_components/merge.md | 247 ------- bin/_docs_components/polyfill.md | 174 ----- bin/_docs_components/xml.md | 200 ------ bin/_docs_components/zip.md | 398 ----------- bin/_extract_catalog.py | 142 ---- bin/_load_catalog.py | 65 +- bin/build-docs-bundle.sh | 4 +- bin/build-reference.py | 10 +- bin/run-snippets.py | 12 +- components/BlockParser/README.md | 347 +++++++--- components/Blueprints/README.md | 389 ++++------- components/ByteStream/README.md | 343 ++++------ components/CLI/README.md | 281 +++++--- components/CORSProxy/README.md | 266 ++++---- components/DataLiberation/README.md | 466 ++++++------- components/Encoding/README.md | 247 ++++--- components/Filesystem/README.md | 290 +++++--- components/Git/README.md | 299 ++++++--- components/HTML/README.md | 437 +++++++++--- components/HttpClient/README.md | 696 +++++++++++++++----- components/HttpServer/README.md | 244 +++---- components/Markdown/README.md | 279 +++++--- components/Merge/README.md | 355 +++++----- components/Polyfill/README.md | 286 ++++---- components/ToolkitCodingStandards/README.md | 145 ++-- components/XML/README.md | 255 ++++--- components/Zip/README.md | 434 +++++++++--- 47 files changed, 3660 insertions(+), 7364 deletions(-) delete mode 100644 bin/_docs_components/README.md delete mode 100644 bin/_docs_components/_order.txt delete mode 100644 bin/_docs_components/blockparser.md delete mode 100644 bin/_docs_components/blueprints.md delete mode 100644 bin/_docs_components/bytestream.md delete mode 100644 bin/_docs_components/cli.md delete mode 100644 bin/_docs_components/coding-standards.md delete mode 100644 bin/_docs_components/corsproxy.md delete mode 100644 bin/_docs_components/dataliberation.md delete mode 100644 bin/_docs_components/encoding.md delete mode 100644 bin/_docs_components/filesystem.md delete mode 100644 bin/_docs_components/git.md delete mode 100644 bin/_docs_components/html.md delete mode 100644 bin/_docs_components/httpclient.md delete mode 100644 bin/_docs_components/httpserver.md delete mode 100644 bin/_docs_components/markdown.md delete mode 100644 bin/_docs_components/merge.md delete mode 100644 bin/_docs_components/polyfill.md delete mode 100644 bin/_docs_components/xml.md delete mode 100644 bin/_docs_components/zip.md delete mode 100644 bin/_extract_catalog.py diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index 76379be29..42aaa4105 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -9,7 +9,6 @@ on: - 'bin/build-docs-bundle.sh' - 'bin/build-reference.py' - 'bin/_docs_components.py' - - 'bin/_docs_components/**' - 'bin/_load_catalog.py' - 'composer.json' - 'composer.lock' diff --git a/.github/workflows/snippet-tests.yml b/.github/workflows/snippet-tests.yml index 77a2f90fd..4d84ab70b 100644 --- a/.github/workflows/snippet-tests.yml +++ b/.github/workflows/snippet-tests.yml @@ -1,6 +1,6 @@ name: Verify docs snippets -# Runs every PHP snippet declared in bin/_docs_components/.md against +# Runs every PHP snippet declared in components//README.md against # the local toolkit and compares stdout to the expected-output block stored # next to the snippet in markdown. Anything that drifts fails CI; anything # that errors out also fails CI. @@ -14,7 +14,6 @@ on: paths: - 'components/**' - 'bin/_docs_components.py' - - 'bin/_docs_components/**' - 'bin/_load_catalog.py' - 'bin/run-snippets.py' - 'composer.json' diff --git a/.gitignore b/.gitignore index f5a2e5fb1..3c130c40e 100644 --- a/.gitignore +++ b/.gitignore @@ -57,3 +57,4 @@ docs/reference/zip.html # Bundled toolkit source for WordPress Playground — regenerated on every deploy. docs/assets/php-toolkit.zip +docs-changes.md diff --git a/bin/_docs_components.py b/bin/_docs_components.py index f45bc4fb5..8a29ed7a3 100644 --- a/bin/_docs_components.py +++ b/bin/_docs_components.py @@ -1,10 +1,12 @@ # Component catalog for the runnable docs site. # # Per-component content (lede, sections, snippets, credit callouts, -# see-also links, expected snippet outputs) is sourced from -# bin/_docs_components/.md — see bin/_load_catalog.py for the format. -# That keeps the docs source in plain markdown with code-fence snippets, -# editable in any text editor and renderable as-is on github.com. +# see-also links, expected snippet outputs) is sourced from each +# components//README.md — see bin/_load_catalog.py for the format. +# The README *is* the catalog source: GitHub and Packagist render it as +# a normal README (frontmatter is hidden by GitHub's renderer); the +# build pipeline parses the frontmatter + snippet metadata blocks to +# generate the docs site and run snippets in CI. # # This file still owns the small global metadata that doesn't belong in any # single component's markdown: the landing-page starter paths and the diff --git a/bin/_docs_components/README.md b/bin/_docs_components/README.md deleted file mode 100644 index 9c056b463..000000000 --- a/bin/_docs_components/README.md +++ /dev/null @@ -1,131 +0,0 @@ -# Docs catalog source - -Each `.md` file in this directory is the source of truth for one -component on the [PHP Toolkit docs site](https://wordpress.github.io/php-toolkit/). -Editing a file here changes the rendered reference page **and** the snippet -that runs in CI **and** the captured expected output that the page -pre-renders before WordPress Playground boots. - -Component order on the site is controlled by `_order.txt` in this directory. - -## Format - -````markdown ---- -slug: -title: -install: # optional - -credit_title: # optional -credit_body: | - - -see_also: | | <reason> # optional, repeatable -see_also: <other-slug> | <Other> | <reason> ---- - -<one-paragraph lede, raw HTML allowed (e.g. <code>...</code>)> - -## Section heading - -<body content for the section, raw HTML allowed> - -<!-- snippet: -filename: example.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -// example code... -echo "hello\n"; -``` - -<!-- expected-output --> -``` -hello -``` -```` - -### Rules - -- **Frontmatter** is required. `slug` must match the filename. `title` is - required. `install` is the Packagist package name (e.g. - `wp-php-toolkit/zip`); omit it for components that don't ship as a - separate package. - -- **`credit_title` + `credit_body`** render as a callout below the lede on - the reference page (used for "Ported from WordPress core" notes and the - like). Body text is raw HTML using the YAML pipe (`|`) form: each - continuation line is indented two spaces. - -- **`see_also`** lines render as the "See also" section at the bottom of - the page. Format: `<slug> | <Title> | <reason>`. Repeat the key once per - related component. - -- **The lede** is everything between the closing `---` and the first `## ` - heading. Raw HTML allowed. - -- **Sections** start at `## Heading` (column zero). Each section's body is - the raw HTML between its heading and the next `## ` (or end of file). - Blank lines separate paragraphs; newlines inside `<pre><code>` and inside - fenced code blocks are preserved. - -- **Snippets** are optional, at most one per section, and have two parts: - - 1. An HTML comment with metadata: - ``` - <!-- snippet: - filename: <name>.php - runnable: true | false - --> - ``` - `filename` is required and uniquely identifies the snippet within the - component. `runnable` defaults to `true`. - - 2. A fenced PHP block immediately after the metadata comment. The fence - holds the snippet *verbatim* — including the opening `<?php` and the - autoload `require`. There is no implicit prelude. If the snippet - itself contains a triple-backtick run (e.g. a markdown sample inside a - heredoc), use a four-backtick fence; the loader matches the opening - length. - -- **Expected outputs** sit in a sibling fenced block right after the snippet's - php fence, marked with `<!-- expected-output -->`. The block holds the - captured stdout from running the snippet locally. The docs site uses it - to pre-render results before Playground boots; CI compares against it on - every PR. - -- **Pitfalls** are paragraphs in any section's body that begin with - "Footgun:" or "Gotcha:". `bin/build-reference.py` lifts them out and - renders them as a unified "Pitfalls" section near the bottom of the page. - -## Workflow - -- Edit a `.md` file. Snippet code, prose, expected outputs — all live here. -- Run `python3 bin/build-reference.py` to regenerate the local HTML pages. -- Run `bin/run-snippets.py --check` to verify that snippets still produce - the captured stdout. If a change is intentional, `--update` rewrites the - expected-output blocks in place. - -The generated `docs/reference/<slug>.html` files are **not** checked in — -they regenerate from these markdown sources on every deploy and are -listed in the repo `.gitignore`. Treat them as a build artifact, not as -content. Same for `docs/assets/php-toolkit.zip`, which -`bin/build-docs-bundle.sh` rebuilds from the toolkit source. - -CI runs `bin/run-snippets.py --check` on every PR -(`.github/workflows/snippet-tests.yml`) and `bin/build-reference.py` on -every push to `trunk` (`.github/workflows/docs.yml`). - -## Tooling notes - -`bin/_load_catalog.py` parses these files into the COMPONENTS data -structure consumed by `bin/build-reference.py` and `bin/run-snippets.py`. - -`bin/_extract_catalog.py` is the one-shot tool that produced these files -from the legacy Python catalog during the migration. It is kept in the -tree as a regression aid: re-running it after manual edits is a quick way -to confirm that the catalog state can still round-trip. diff --git a/bin/_docs_components/_order.txt b/bin/_docs_components/_order.txt deleted file mode 100644 index 83d3df6eb..000000000 --- a/bin/_docs_components/_order.txt +++ /dev/null @@ -1,21 +0,0 @@ -# Component order in the docs site. Edit this file to reorder. -# Lines beginning with # are comments. Blank lines are ignored. - -html -zip -bytestream -filesystem -blockparser -markdown -xml -encoding -dataliberation -git -merge -httpclient -httpserver -corsproxy -cli -polyfill -blueprints -coding-standards diff --git a/bin/_docs_components/blockparser.md b/bin/_docs_components/blockparser.md deleted file mode 100644 index 95cb8c22b..000000000 --- a/bin/_docs_components/blockparser.md +++ /dev/null @@ -1,301 +0,0 @@ ---- -slug: blockparser -title: BlockParser -install: wp-php-toolkit/blockparser - -credit_title: WordPress core, packaged standalone -credit_body: | - <code>WP_Block_Parser</code> is WordPress core's block parser, packaged here so importers and linters can read <a href="https://developer.wordpress.org/block-editor/reference-guides/block-api/">block markup</a> without booting WordPress. Source: <a href="https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-includes/class-wp-block-parser.php">WordPress/wordpress-develop</a>. - -see_also: html | HTML | Inspect or rewrite the HTML carried by parsed blocks. -see_also: markdown | Markdown | Move between author-friendly Markdown and serialized block markup. -see_also: dataliberation | DataLiberation | Audit and transform blocks while migrating content. ---- - -WordPress core's block parser, packaged as a standalone library. Turn block markup into a structured tree, lint posts for common authoring mistakes, and audit block usage — all without booting WordPress. - -## Why this exists - -<p>Block markup is not plain HTML. A post can contain HTML comments that identify blocks, JSON attributes inside those comments, freeform HTML between blocks, and nested blocks whose rendered HTML is interleaved with parent markup.</p> - -<p>This component packages WordPress core's block parser so importers, linters, migration tools, and static analyzers can understand block content without loading WordPress. It deliberately mirrors core behavior — same array shape, same <code>null</code> blocks for freeform HTML, same core block names such as <code>core/paragraph</code> — so code written against this parser keeps working when run inside WordPress, and vice versa.</p> - -<p>Reach for it when you need answers about the block tree: which blocks a post uses, which attributes they carry, where nested blocks appear, or whether content violates a rule your project cares about.</p> - -## What you get back - -<p><code>WP_Block_Parser::parse()</code> returns an array of blocks. Each block is an associative array with five keys: <code>blockName</code>, <code>attrs</code>, <code>innerBlocks</code>, <code>innerHTML</code>, and <code>innerContent</code>.</p> - -<p><code>innerHTML</code> is the HTML inside the block <em>with inner blocks stripped out</em>. <code>innerContent</code> is the interleaved version: an array of HTML strings with <code>null</code> placeholders marking where each inner block belongs.</p> - -<p>Most code starts by checking <code>blockName</code>, then reading <code>attrs</code> or <code>innerHTML</code>. When a post has container blocks such as Group, Columns, or Navigation, look inside <code>innerBlocks</code> too.</p> - -<p>Footgun: <strong>Freeform HTML between blocks shows up as a block with <code>blockName === null</code>.</strong> Always skip that case before comparing names.</p> - -## Parse a document - -<p>The simplest possible use. Pass a string, get back a tree.</p> - -<!-- snippet: -filename: parse.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -$document = "<!-- wp:heading {\"level\":2} -->\n<h2>Welcome</h2>\n<!-- /wp:heading -->\n\n" - . "<!-- wp:paragraph -->\n<p>Hello from the block editor.</p>\n<!-- /wp:paragraph -->"; - -$blocks = ( new WP_Block_Parser() )->parse( $document ); -foreach ( $blocks as $block ) { - if ( null === $block['blockName'] ) { - continue; - } - echo $block['blockName'] . ': ' . trim( strip_tags( $block['innerHTML'] ) ) . "\n"; -} -``` - -<!-- expected-output --> -``` -core/heading: Welcome -core/paragraph: Hello from the block editor. -``` - -## Count every block type in a post - -<p>A common audit task: "How many Paragraph, Image, and Gallery blocks does this post use?" A small queue keeps the example readable while still visiting nested blocks.</p> - -<!-- snippet: -filename: count-blocks.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -$document = "<!-- wp:group --><div class=\"wp-block-group\">" - . "<!-- wp:heading --><h2>Title</h2><!-- /wp:heading -->" - . "<!-- wp:paragraph --><p>One.</p><!-- /wp:paragraph -->" - . "<!-- wp:paragraph --><p>Two.</p><!-- /wp:paragraph -->" - . "<!-- wp:image {\"id\":1} --><figure><img src=\"a.jpg\"/></figure><!-- /wp:image -->" - . "</div><!-- /wp:group -->"; - -$blocks = ( new WP_Block_Parser() )->parse( $document ); - -$counts = array(); -$queue = $blocks; - -while ( ! empty( $queue ) ) { - $block = array_shift( $queue ); - - if ( null !== $block['blockName'] ) { - $name = $block['blockName']; - $counts[ $name ] = isset( $counts[ $name ] ) ? $counts[ $name ] + 1 : 1; - } - - foreach ( $block['innerBlocks'] as $inner_block ) { - $queue[] = $inner_block; - } -} - -arsort( $counts ); -foreach ( $counts as $name => $n ) { - echo str_pad( (string) $n, 4, ' ', STR_PAD_LEFT ) . ' ' . $name . "\n"; -} -``` - -<!-- expected-output --> -``` - 2 core/paragraph - 1 core/group - 1 core/heading - 1 core/image -``` - -## Check whether a post uses a block - -<p>Useful for templates, audits, and migrations: answer one yes/no question without caring where the block appears in the tree.</p> - -<!-- snippet: -filename: has-block.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -$document = "<!-- wp:group --><div class=\"wp-block-group\">" - . "<!-- wp:buttons --><div class=\"wp-block-buttons\">" - . "<!-- wp:button --><div class=\"wp-block-button\"><a>Buy now</a></div><!-- /wp:button -->" - . "</div><!-- /wp:buttons -->" - . "</div><!-- /wp:group -->"; - -$blocks = ( new WP_Block_Parser() )->parse( $document ); - -function post_has_block( $blocks, $name ) { - $queue = $blocks; - - while ( ! empty( $queue ) ) { - $block = array_shift( $queue ); - if ( $name === $block['blockName'] ) { - return true; - } - - foreach ( $block['innerBlocks'] as $inner_block ) { - $queue[] = $inner_block; - } - } - - return false; -} - -echo post_has_block( $blocks, 'core/button' ) ? "has button\n" : "missing button\n"; -echo post_has_block( $blocks, 'core/gallery' ) ? "has gallery\n" : "missing gallery\n"; -``` - -<!-- expected-output --> -``` -has button -missing gallery -``` - -## Lint headings for hierarchy mistakes - -<p>"Don't skip from H2 to H4" is a real accessibility rule. The helper below keeps headings in document order, including headings nested inside Group, Column, and Cover blocks.</p> - -<!-- snippet: -filename: lint-headings.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -$document = "<!-- wp:heading -->\n<h2>Intro</h2>\n<!-- /wp:heading -->" - . "<!-- wp:heading {\"level\":4} -->\n<h4>Subsection</h4>\n<!-- /wp:heading -->" - . "<!-- wp:heading {\"level\":3} -->\n<h3>Body</h3>\n<!-- /wp:heading -->"; - -$blocks = ( new WP_Block_Parser() )->parse( $document ); - -function collect_headings( $blocks, &$headings ) { - foreach ( $blocks as $block ) { - if ( 'core/heading' === $block['blockName'] ) { - $headings[] = array( - 'level' => isset( $block['attrs']['level'] ) ? (int) $block['attrs']['level'] : 2, - 'text' => trim( strip_tags( $block['innerHTML'] ) ), - ); - } - - collect_headings( $block['innerBlocks'], $headings ); - } -} - -$headings = array(); -collect_headings( $blocks, $headings ); - -$last = 1; -foreach ( $headings as $heading ) { - $level = $heading['level']; - $label = $heading['text']; - - if ( $level > $last + 1 ) { - echo "WARN {$label}: jumped from H{$last} to H{$level}\n"; - } else { - echo "ok {$label}: H{$level}\n"; - } - $last = $level; -} -``` - -<!-- expected-output --> -``` -ok Intro: H2 -WARN Subsection: jumped from H2 to H4 -ok Body: H3 -``` - -## Find all instances of a custom block - -<p>When auditing an export for a block your plugin owns, collect every match and print the fields a human cares about.</p> - -<!-- snippet: -filename: find-custom-block.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -$document = "<!-- wp:paragraph --><p>Reviews</p><!-- /wp:paragraph -->" - . "<!-- wp:my-plugin/testimonial {\"author\":\"Jane\",\"rating\":5} -->" - . "<blockquote>Loved it.</blockquote>" - . "<!-- /wp:my-plugin/testimonial -->" - . "<!-- wp:my-plugin/testimonial {\"author\":\"Joe\",\"rating\":4} -->" - . "<blockquote>Pretty good.</blockquote>" - . "<!-- /wp:my-plugin/testimonial -->"; - -$blocks = ( new WP_Block_Parser() )->parse( $document ); - -function find_blocks_by_name( $blocks, $name, &$matches ) { - foreach ( $blocks as $block ) { - if ( $name === $block['blockName'] ) { - $matches[] = $block; - } - - find_blocks_by_name( $block['innerBlocks'], $name, $matches ); - } -} - -$testimonials = array(); -find_blocks_by_name( $blocks, 'my-plugin/testimonial', $testimonials ); - -foreach ( $testimonials as $i => $b ) { - echo ( $i + 1 ) . '. ' . $b['attrs']['author'] . ' (' . $b['attrs']['rating'] . '/5): ' - . trim( strip_tags( $b['innerHTML'] ) ) . "\n"; -} -``` - -<!-- expected-output --> -``` -1. Jane (5/5): Loved it. -2. Joe (4/5): Pretty good. -``` - -## Detect blocks with stale embed URLs - -<p>A real-world content audit: find every <code>core/embed</code> whose URL points at a domain you have retired.</p> - -<!-- snippet: -filename: audit-embeds.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -$document = <<<'HTML' -<!-- wp:embed {"url":"https://twitter.com/wordpress/status/1","providerNameSlug":"twitter"} /--> -<!-- wp:embed {"url":"https://youtube.com/watch?v=abc","providerNameSlug":"youtube"} /--> -<!-- wp:embed {"url":"https://vine.co/v/xyz","providerNameSlug":"vine"} /--> -HTML; - -$retired = array( 'vine.co', 'plus.google.com' ); - -foreach ( ( new WP_Block_Parser() )->parse( $document ) as $b ) { - if ( 'core/embed' !== $b['blockName'] ) { - continue; - } - $url = isset( $b['attrs']['url'] ) ? $b['attrs']['url'] : ''; - $host = parse_url( $url, PHP_URL_HOST ); - $bad = $host && in_array( $host, $retired, true ); - echo ( $bad ? 'STALE ' : 'ok ' ) . $url . "\n"; -} -``` - -<!-- expected-output --> -``` -ok https://twitter.com/wordpress/status/1 -ok https://youtube.com/watch?v=abc -STALE https://vine.co/v/xyz -``` diff --git a/bin/_docs_components/blueprints.md b/bin/_docs_components/blueprints.md deleted file mode 100644 index 95bbe7ca5..000000000 --- a/bin/_docs_components/blueprints.md +++ /dev/null @@ -1,200 +0,0 @@ ---- -slug: blueprints -title: Blueprints -install: wp-php-toolkit/blueprints - -see_also: filesystem | Filesystem | Prepare files and fixtures before applying site setup steps. -see_also: httpclient | HttpClient | Download packages or source data as part of provisioning workflows. -see_also: cli | CLI | Wrap repeatable blueprint operations in a small command. ---- - -Declarative WordPress site provisioning. Write a JSON description of plugins, options, and content; let the runner execute it. - -## Why this exists - -<p>A WordPress environment is more than a database dump. It can require a specific core version, plugins, themes, site options, uploaded files, content, and setup steps. Rebuilding that by hand makes demos, tests, bug reports, workshops, and CI fixtures drift over time.</p> - -<p>The Blueprints component treats site setup as data. A blueprint JSON document describes the desired steps, and the runner applies them to either a new WordPress install or an existing one. The validator exists because user-authored JSON needs clear, path-specific errors rather than generic schema failures.</p> - -<p><code>RunnerConfiguration</code> separates the web root from the WordPress core directory, since real hosts often put them in different places. Both paths are explicit on the runner, never inferred.</p> - -<p>Blueprints can <em>create</em> a new WordPress install (download core, set up the database, apply steps) or <em>apply to an existing</em> site. Creating a fresh install needs filesystem access this in-browser runtime doesn't have, so the runnable snippets focus on <code>APPLY_TO_EXISTING_SITE</code>.</p> - -## Configure a runner for an existing site - -<p><code>RunnerConfiguration</code> is a fluent builder. The minimum: target site root, target site URL, execution mode.</p> - -<!-- snippet: -filename: configure.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\Blueprints\Runner; -use WordPress\Blueprints\RunnerConfiguration; - -$config = ( new RunnerConfiguration() ) - ->set_execution_mode( Runner::EXECUTION_MODE_APPLY_TO_EXISTING_SITE ) - ->set_target_site_root( '/wordpress' ) - ->set_target_site_url( 'http://playground.test/' ); - -echo "mode: " . $config->get_execution_mode() . "\n"; -echo "root: " . $config->get_target_site_root() . "\n"; -echo "url: " . $config->get_target_site_url() . "\n"; -``` - -<!-- expected-output --> -``` -mode: apply-to-existing-site -root: /wordpress -url: http://playground.test/ -``` - -## Generate blueprint JSON from PHP - -<p>CI jobs and tests stay clearer when PHP builds the blueprint from data instead of hand-writing JSON. Keep the structure plain: <code>version</code>, then a list of step arrays.</p> - -<!-- snippet: -filename: build-json.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -$site_name = 'Demo Site'; -$plugins = array( 'gutenberg', 'classic-editor' ); - -$blueprint = array( - 'version' => 2, - 'steps' => array( - array( - 'step' => 'setSiteOptions', - 'options' => array( - 'blogname' => $site_name, - 'permalink_structure' => '/%postname%/', - 'show_on_front' => 'page', - ), - ), - ), -); - -foreach ( $plugins as $slug ) { - $blueprint['steps'][] = array( - 'step' => 'installPlugin', - 'pluginData' => "https://downloads.wordpress.org/plugin/{$slug}.zip", - ); - $blueprint['steps'][] = array( - 'step' => 'activatePlugin', - 'plugin' => "{$slug}/{$slug}.php", - ); -} - -echo json_encode( $blueprint, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES ) . "\n"; -``` - -<!-- expected-output --> -``` -{ - "version": 2, - "steps": [ - { - "step": "setSiteOptions", - "options": { - "blogname": "Demo Site", - "permalink_structure": "/%postname%/", - "show_on_front": "page" - } - }, - { - "step": "installPlugin", - "pluginData": "https://downloads.wordpress.org/plugin/gutenberg.zip" - }, - { - "step": "activatePlugin", - "plugin": "gutenberg/gutenberg.php" - }, - { - "step": "installPlugin", - "pluginData": "https://downloads.wordpress.org/plugin/classic-editor.zip" - }, - { - "step": "activatePlugin", - "plugin": "classic-editor/classic-editor.php" - } - ] -} -``` - -## Validate before running - -<p>The schema validator returns a human-readable <code>ValidationError</code> instead of a generic "does not match schema" failure. Use it before handing user-authored JSON to a runner.</p> - -<!-- snippet: -filename: validate.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\Blueprints\Validator\HumanFriendlySchemaValidator; - -$schema = array( - 'type' => 'object', - 'required' => array( 'version', 'steps' ), - 'properties' => array( - 'version' => array( 'type' => 'integer' ), - 'steps' => array( - 'type' => 'array', - 'items' => array( - 'type' => 'object', - 'required' => array( 'step' ), - 'properties' => array( - 'step' => array( 'type' => 'string' ), - ), - ), - ), - ), -); - -$blueprint = array( - 'version' => 2, - 'steps' => array( - array( 'pluginData' => 'https://downloads.wordpress.org/plugin/gutenberg.zip' ), - ), -); - -$error = ( new HumanFriendlySchemaValidator( $schema ) )->validate( $blueprint ); -if ( null === $error ) { - echo "valid\n"; -} else { - echo $error->get_pretty_path() . ": " . $error->message . "\n"; -} -``` - -<!-- expected-output --> -``` -Blueprint root["steps"][0]: Missing required field: step. -``` - -## The Blueprint JSON shape - -<p>A blueprint is a JSON document with a <code>version</code> field and a <code>steps</code> array. Each step has a <code>"step"</code> discriminator and step-specific fields. This is the same shape used by <a href="https://playground.wordpress.net/">WordPress Playground</a>.</p> - -<pre><code>{ - "version": 2, - "steps": [ - { "step": "setSiteOptions", - "options": { - "blogname": "Demo Site", - "permalink_structure": "/%postname%/" - } }, - { "step": "installPlugin", - "pluginData": "https://downloads.wordpress.org/plugin/gutenberg.zip" }, - { "step": "activatePlugin", - "plugin": "gutenberg/gutenberg.php" } - ] -}</code></pre> diff --git a/bin/_docs_components/bytestream.md b/bin/_docs_components/bytestream.md deleted file mode 100644 index 0a40453c8..000000000 --- a/bin/_docs_components/bytestream.md +++ /dev/null @@ -1,203 +0,0 @@ ---- -slug: bytestream -title: ByteStream -install: wp-php-toolkit/bytestream - -see_also: filesystem | Filesystem | Back file reads and writes with the same stream primitives. -see_also: zip | Zip | Read and write archive entries one stream at a time. -see_also: httpclient | HttpClient | Process request and response bodies incrementally. ---- - -Composable streaming primitives for reading, writing, transforming, hashing, and compressing byte data. Pull/peek/consume semantics let parsers backtrack without copying, and deflate, inflate, and checksum filters snap together like Lego. - -## Why this exists - -<p>PHP's native streams are powerful but inconsistent. <code>fread</code> on a socket may return short reads with no warning; <code>stream_filter_append</code> is awkward to compose; gzip helpers and file handles expose different APIs. The ByteStream component normalizes these behind one small interface — <code>pull / peek / consume</code> — so a parser, a hash function, and a deflate filter all see the same shape.</p> - -<p>The split between <em>pull</em> (buffer up to N bytes) and <em>consume</em> (advance past N bytes) is the secret. Parsers can <code>peek</code> ahead to detect a record boundary and decide whether to <code>consume</code>, without copying or allocating.</p> - -## Read a file in chunks - -<p>The canonical loop. <code>pull(N)</code> reads up to <code>N</code> bytes from the underlying source into an internal buffer and returns how many ended up there; <code>consume(N)</code> reads <code>N</code> bytes from that buffer and advances past them. The buffer never grows beyond the chunk size you ask for.</p> - -<!-- snippet: -filename: teaser-read.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\ByteStream\ReadStream\FileReadStream; - -$path = tempnam( sys_get_temp_dir(), 'demo' ); -file_put_contents( $path, str_repeat( "log line\n", 200 ) ); - -$reader = FileReadStream::from_path( $path ); -$total = 0; -while ( ! $reader->reached_end_of_data() ) { - $n = $reader->pull( 256 ); - if ( 0 === $n ) break; - $total += strlen( $reader->consume( $n ) ); -} -$reader->close_reading(); -echo "Read {$total} bytes in 256-byte chunks.\n"; -``` - -<!-- expected-output --> -``` -Read 1800 bytes in 256-byte chunks. -``` - -## MemoryPipe as write-then-read buffer - -<p><code>MemoryPipe</code> is bidirectional: you <code>append_bytes()</code> as a writer and <code>pull/consume</code> as a reader. Easiest way to wire one component's output into another's input.</p> - -<p>Gotcha: <strong>A producer must call <code>close_writing()</code> when done — otherwise the consumer eventually throws <code>NotEnoughDataException</code> instead of seeing EOF.</strong> </p> - -<!-- snippet: -filename: memory-pipe.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\ByteStream\MemoryPipe; - -$pipe = new MemoryPipe(); -$pipe->append_bytes( "first chunk\n" ); -$pipe->append_bytes( "second chunk\n" ); -$pipe->append_bytes( "third chunk\n" ); -$pipe->close_writing(); - -while ( ! $pipe->reached_end_of_data() ) { - $n = $pipe->pull( 1024 ); - if ( 0 === $n ) break; - echo "got: " . $pipe->consume( $n ); -} -``` - -<!-- expected-output --> -``` -got: first chunk -second chunk -third chunk -``` - -## Compress on the way in, decompress on the way out - -<p>Wrap a stream in <code>DeflateReadStream</code> to get compressed bytes out; wrap it in <code>InflateReadStream</code> to get decompressed bytes out. Both are full <code>ByteReadStream</code> implementations, so they nest into anything else that takes a stream.</p> - -<!-- snippet: -filename: deflate-roundtrip.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\ByteStream\MemoryPipe; -use WordPress\ByteStream\ReadStream\DeflateReadStream; -use WordPress\ByteStream\ReadStream\InflateReadStream; - -$original = str_repeat( "the quick brown fox. ", 50 ); - -$src = new MemoryPipe( $original ); -$src->close_writing(); -$deflated = new DeflateReadStream( $src, ZLIB_ENCODING_DEFLATE ); -$compressed = $deflated->consume_all(); - -$src2 = new MemoryPipe( $compressed ); -$src2->close_writing(); -$inflated = new InflateReadStream( $src2, ZLIB_ENCODING_DEFLATE ); -$round = $inflated->consume_all(); - -printf( "original : %d bytes\n", strlen( $original ) ); -printf( "deflated : %d bytes (%.1f%%)\n", strlen( $compressed ), 100 * strlen( $compressed ) / strlen( $original ) ); -printf( "round-trip: %s\n", $round === $original ? 'OK' : 'BROKEN' ); -``` - -<!-- expected-output --> -``` -original : 1050 bytes -deflated : 45 bytes (4.3%) -round-trip: OK -``` - -## Line-by-line reads from a chunked source - -<p>Reading text by line means handling chunk boundaries that fall mid-line. Keep the trailing partial line and prepend it to the next pull. The rest of the loop pretends the data was always whole.</p> - -<!-- snippet: -filename: lines.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\ByteStream\MemoryPipe; - -$pipe = new MemoryPipe(); -$pipe->append_bytes( "alpha\nbravo\ncharl" ); -$pipe->append_bytes( "ie\ndelta\necho\n" ); -$pipe->close_writing(); - -$tail = ''; -$count = 0; -while ( ! $pipe->reached_end_of_data() ) { - $n = $pipe->pull( 8 ); - if ( 0 === $n ) break; - $buf = $tail . $pipe->consume( $n ); - $lines = explode( "\n", $buf ); - $tail = array_pop( $lines ); - foreach ( $lines as $line ) { - printf( "[%d] %s\n", ++$count, $line ); - } -} -if ( '' !== $tail ) { - printf( "[%d] %s\n", ++$count, $tail ); -} -``` - -<!-- expected-output --> -``` -[1] alpha -[2] bravo -[3] charlie -[4] delta -[5] echo -``` - -## Limit a stream to a fixed window - -<p><code>LimitedByteReadStream</code> exposes only the next N bytes of an underlying stream as if those were the entire stream. This is how the ZIP decoder hands you the body of one entry without letting you read into the next.</p> - -<!-- snippet: -filename: limited.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\ByteStream\MemoryPipe; -use WordPress\ByteStream\ReadStream\LimitedByteReadStream; - -$source = new MemoryPipe( "HEADER:42|BODY:hello there|FOOTER:done" ); -$source->close_writing(); - -$source->pull( 10 ); -$source->consume( 10 ); - -$body = new LimitedByteReadStream( $source, 16 ); -echo "body sees: " . $body->consume_all() . "\n"; -echo "remaining in source: " . $source->consume_all() . "\n"; -``` - -<!-- expected-output --> -``` -body sees: BODY:hello there -remaining in source: |FOOTER:done -``` diff --git a/bin/_docs_components/cli.md b/bin/_docs_components/cli.md deleted file mode 100644 index 017eeaf18..000000000 --- a/bin/_docs_components/cli.md +++ /dev/null @@ -1,232 +0,0 @@ ---- -slug: cli -title: CLI -install: wp-php-toolkit/cli - -see_also: filesystem | Filesystem | Keep command behavior testable with in-memory storage. -see_also: blueprints | Blueprints | Build repeatable site setup commands around parsed options. -see_also: httpserver | HttpServer | Add a local web UI to a CLI workflow. ---- - -POSIX-style argument parser. Long options, short bundles, inline values, positional args — one static call. - -## Why this exists - -<p>Real CLI tools in PHP usually mean either pulling in <code>symfony/console</code> (and the transitive dependencies that come with it) or hand-rolling argv parsing that breaks the first time someone writes <code>-vvv</code> or <code>--port=8080</code>. The toolkit's <code>CLI</code> class is one static method, no dependencies, and handles the POSIX shapes you actually see.</p> - -## Parse a single flag - -<p>The smallest useful invocation: one boolean flag, one positional. Each option is a four-tuple of <code>[ short, has_value, default, description ]</code>.</p> - -<!-- snippet: -filename: parse-flag.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\CLI\CLI; - -$option_defs = array( - 'verbose' => array( 'v', false, false, 'Enable verbose output' ), -); - -list( $positionals, $options ) = CLI::parse_command_args_and_options( - array( '-v', 'input.txt' ), - $option_defs -); - -echo "verbose: " . ( $options['verbose'] ? 'yes' : 'no' ) . "\n"; -echo "input: " . $positionals[0] . "\n"; -``` - -<!-- expected-output --> -``` -verbose: yes -input: input.txt -``` - -## Mix values, flags, and bundles - -<p>The parser accepts <code>--port 8080</code>, <code>--port=8080</code>, <code>-p 8080</code>, and <code>-p=8080</code>. It also expands bundled boolean shorts such as <code>-afv</code>.</p> - -<!-- snippet: -filename: mix-shapes.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\CLI\CLI; - -$option_defs = array( - 'all' => array( 'a', false, false, 'Process everything' ), - 'force' => array( 'f', false, false, 'Overwrite existing files' ), - 'verbose' => array( 'v', false, false, 'Verbose output' ), - 'output' => array( 'o', true, null, 'Output path' ), - 'port' => array( 'p', true, '3000', 'Server port' ), -); - -$argv = array( '-afv', '--port=8080', '-o', '/tmp/result.txt', 'input.json' ); -list( $positionals, $options ) = CLI::parse_command_args_and_options( $argv, $option_defs ); - -echo "input: " . $positionals[0] . "\n"; -echo "flags: " . implode( ', ', array_keys( array_filter( array( - 'all' => $options['all'], - 'force' => $options['force'], - 'verbose' => $options['verbose'], -) ) ) ) . "\n"; -echo "output: " . $options['output'] . "\n"; -echo "port: " . $options['port'] . "\n"; -``` - -<!-- expected-output --> -``` -input: input.json -flags: all, force, verbose -output: /tmp/<tempfile>.txt -port: 8080 -``` - -## Validate required options - -<p>The parser fills in defaults but never enforces "required". Check for <code>null</code> after parsing — full control over the error message.</p> - -<!-- snippet: -filename: require-options.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\CLI\CLI; - -$option_defs = array( - 'site-url' => array( 'u', true, null, 'Public site URL (required)' ), - 'site-path' => array( null, true, null, 'Target directory (required)' ), -); - -$argv = array( '--site-url', 'https://mysite.test' ); - -try { - list( , $options ) = CLI::parse_command_args_and_options( $argv, $option_defs ); - foreach ( array( 'site-url', 'site-path' ) as $name ) { - if ( null === $options[ $name ] ) { - throw new RuntimeException( "Missing required option --{$name}" ); - } - } - echo "All good.\n"; -} catch ( Exception $e ) { - echo "error: " . $e->getMessage() . "\n"; -} -``` - -<!-- expected-output --> -``` -error: Missing required option --site-path -``` - -## Generate --help from definitions - -<p>Because each option carries its own description, you can render help text by walking the same definitions you parse with. No second source of truth.</p> - -<!-- snippet: -filename: help-text.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\CLI\CLI; - -$option_defs = array( - 'output' => array( 'o', true, null, 'Write result to FILE' ), - 'force' => array( 'f', false, false, 'Overwrite existing files' ), - 'verbose' => array( 'v', false, false, 'Verbose output' ), - 'help' => array( 'h', false, false, 'Show this help and exit' ), -); - -function render_help( array $defs ) { - echo "Usage: mytool [options] <input>\n\nOptions:\n"; - foreach ( $defs as $long => $def ) { - list( $short, $has_value, $default, $desc ) = $def; - $flag = ( $short ? "-{$short}, " : ' ' ) . "--{$long}"; - if ( $has_value ) $flag .= '=VALUE'; - echo sprintf( " %-28s %s\n", $flag, $desc ); - } -} - -list( , $options ) = CLI::parse_command_args_and_options( array( '-h' ), $option_defs ); -if ( $options['help'] ) render_help( $option_defs ); -``` - -<!-- expected-output --> -``` -Usage: mytool [options] <input> - -Options: - -o, --output=VALUE Write result to FILE - -f, --force Overwrite existing files - -v, --verbose Verbose output - -h, --help Show this help and exit -``` - -## Git-style subcommands - -<p>To build a tool with subcommands like <code>mytool deploy</code>, peel the first positional off <code>argv</code>, dispatch, and parse the rest with a per-command option set.</p> - -<!-- snippet: -filename: subcommands.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\CLI\CLI; - -$commands = array( - 'deploy' => array( - 'env' => array( 'e', true, 'staging', 'Target environment' ), - 'dry-run' => array( 'n', false, false, 'Preview without applying' ), - ), - 'rollback' => array( - 'to' => array( 't', true, null, 'Revision to roll back to' ), - ), -); - -function run( array $argv, array $commands ) { - if ( empty( $argv ) ) { - echo "Usage: mytool <command> [options]\nCommands: " . implode( ', ', array_keys( $commands ) ) . "\n"; - return; - } - $command = array_shift( $argv ); - if ( ! isset( $commands[ $command ] ) ) { - echo "Unknown command: {$command}\n"; - return; - } - list( $positionals, $options ) = CLI::parse_command_args_and_options( $argv, $commands[ $command ] ); - echo "command={$command}\n"; - echo "options: " . json_encode( $options ) . "\n"; - echo "positionals: " . json_encode( $positionals ) . "\n"; -} - -run( array( 'deploy', '--env=production', '-n', 'web-01', 'web-02' ), $commands ); -echo "---\n"; -run( array( 'rollback', '-t', 'abc123' ), $commands ); -``` - -<!-- expected-output --> -``` -command=deploy -options: {"env":"production","dry-run":true} -positionals: ["web-01","web-02"] ---- -command=rollback -options: {"to":"abc123"} -positionals: [] -``` diff --git a/bin/_docs_components/coding-standards.md b/bin/_docs_components/coding-standards.md deleted file mode 100644 index d7b9ffe6d..000000000 --- a/bin/_docs_components/coding-standards.md +++ /dev/null @@ -1,66 +0,0 @@ ---- -slug: coding-standards -title: ToolkitCodingStandards -install: wp-php-toolkit/toolkit-coding-standards - -see_also: polyfill | Polyfill | Share WordPress-style compatibility expectations across standalone packages. ---- - -PHP_CodeSniffer sniffs used by this project: enforce Yoda comparisons and ban the short ternary where it hides falsy-value bugs. - -## Why this exists - -<p>This package is not a general-purpose style guide. It holds project-specific PHP_CodeSniffer rules for review comments the toolkit wants automated: comparisons should follow the WordPress Yoda style, and short ternaries should not hide whether a fallback is meant for <code>null</code> only or for all falsy values.</p> - -<p>Use it in this monorepo, or in a project that intentionally wants the same review tradeoffs. If your project does not follow WordPress-style comparisons, the Yoda sniff is probably the wrong rule for you.</p> - -## Reference the standard from your phpcs.xml - -<p>The component is a PHPCS ruleset, so the useful examples are configuration and before/after code rather than runtime snippets. Activate both sniffs at once by referencing <code>WordPressToolkitCodingStandards</code>:</p> - -<pre><code><?xml version="1.0"?> -<ruleset name="My Project"> - <file>src/</file> - - <!-- Activate both toolkit sniffs --> - <rule ref="WordPressToolkitCodingStandards"/> - - <!-- Or pick them individually --> - <!-- <rule ref="WordPressToolkitCodingStandards.PHP.EnforceYodaComparison"/> --> - <!-- <rule ref="WordPressToolkitCodingStandards.PHP.DisallowShortTernary"/> --> -</ruleset></code></pre> - -<p>Then run phpcs and phpcbf the usual way:</p> - -<pre><code>vendor/bin/phpcs --standard=phpcs.xml . -vendor/bin/phpcbf --standard=phpcs.xml .</code></pre> - -## EnforceYodaComparison: catches accidental assignment - -<p>Yoda comparisons (<code>true === $x</code>) make typo-induced assignments easier to catch and match the WordPress style used throughout the toolkit:</p> - -<pre><code>// Bug: single = inside a condition. Always truthy, mutates $status. -if ( $status = 'published' ) { - publish_post( $post ); -} - -// Yoda style: writing this typo would be a parse error. -if ( 'published' === $status ) { - publish_post( $post ); -}</code></pre> - -<p>The sniff covers <code>===</code>, <code>!==</code>, <code>==</code>, and <code>!=</code>, and stays quiet when both sides are dynamic.</p> - -## Why ban the short ternary - -<p>Developers confuse the short ternary (<code>$a ?: $b</code>) with the null-coalescing operator (<code>$a ?? $b</code>). They differ on falsy-but-not-null values: <code>0 ?: 'fallback'</code> returns <code>'fallback'</code>, but <code>0 ?? 'fallback'</code> returns <code>0</code>. The sniff bans <code>?:</code> entirely so reviewers don't have to relitigate this on every PR.</p> - -## Review-friendly replacements - -<p>When the fallback should apply only to <code>null</code>, use <code>??</code>. When the fallback should apply to every falsy value, write the full ternary so the intent is visible in review.</p> - -<pre><code>// Only missing values fall back. 0 and "" are preserved. -$limit = $request_limit ?? 20; - -// Any falsy value falls back. The duplicated condition is intentional. -$title = $raw_title ? $raw_title : 'Untitled';</code></pre> diff --git a/bin/_docs_components/corsproxy.md b/bin/_docs_components/corsproxy.md deleted file mode 100644 index 770d2d164..000000000 --- a/bin/_docs_components/corsproxy.md +++ /dev/null @@ -1,151 +0,0 @@ ---- -slug: corsproxy -title: CORSProxy -install: wp-php-toolkit/corsproxy - -see_also: httpclient | HttpClient | Fetch upstream responses from PHP when browser CORS blocks direct access. -see_also: httpserver | HttpServer | Understand the local-server shape before deploying a proxy endpoint. ---- - -A small PHP CORS proxy intended for browser-side code that needs to reach servers without CORS headers. - -## Why this exists - -<p>A Playground-style browser tool reads <code>https://api.github.com/repos/WordPress/php-toolkit</code>, a plugin ZIP from <code>downloads.wordpress.org</code>, or a raw fixture from GitHub. The browser blocks the response when the upstream server does not send the required CORS headers, even though PHP can fetch the same public URL server-side.</p> - -<p>The CORSProxy component is that server-side bridge. It accepts a target URL, fetches it from PHP, and returns a browser-readable response. Because an open proxy is a security and abuse risk, real deployments should add host allowlists, rate limits, header controls, and private-network protections appropriate to their environment.</p> - -## Run the proxy locally - -<p class="callout"><strong>Run on your machine:</strong> the proxy needs to listen on a port. Start PHP's built-in server and request any HTTPS URL through it.</p> - -<pre><code>PLAYGROUND_CORS_PROXY_DISABLE_RATE_LIMIT=1 \ - php -S 127.0.0.1:5263 vendor/wp-php-toolkit/corsproxy/cors-proxy.php - -# In another terminal: -curl -s "http://127.0.0.1:5263/cors-proxy.php/https://api.github.com/repos/WordPress/php-toolkit" | head -</code></pre> - -## Production rate limiting - -<p>Drop a <code>cors-proxy-config.php</code> next to <code>cors-proxy.php</code>. If that file defines a <code>playground_cors_proxy_maybe_rate_limit()</code> function, the proxy calls it before forwarding any request — your one chance to reject early. Without the file, the proxy applies its default rate limiter, which is fine for development but should be replaced for any deployment that gets real traffic.</p> - -<p>This example uses a per-IP token bucket stored on disk. Replace with Redis or memcached for multi-host deployments.</p> - -<!-- snippet: -filename: cors-proxy-config.php -runnable: false ---> -```php -<?php -// cors-proxy-config.php — placed next to cors-proxy.php. - -function playground_cors_proxy_maybe_rate_limit() { - $ip = isset( $_SERVER['REMOTE_ADDR'] ) ? $_SERVER['REMOTE_ADDR'] : '0.0.0.0'; - $bucket = sys_get_temp_dir() . '/cors-rl-' . md5( $ip ); - $now = time(); - $window = 60; - $max_req = 30; - - $hits = array(); - if ( file_exists( $bucket ) ) { - $hits = json_decode( file_get_contents( $bucket ), true ); - if ( ! is_array( $hits ) ) $hits = array(); - } - $hits = array_filter( $hits, function ( $t ) use ( $now, $window ) { - return $t > $now - $window; - } ); - - if ( count( $hits ) >= $max_req ) { - header( 'Retry-After: ' . $window ); - http_response_code( 429 ); - echo 'Rate limit exceeded'; - exit; - } - - $hits[] = $now; - file_put_contents( $bucket, json_encode( array_values( $hits ) ) ); -} - -echo "Config loaded — rate limiter armed.\n"; -``` - -## Allowlist upstream hosts - -<p>Out of the box the proxy will fetch any public URL. Most real deployments want a fixed list of upstreams — GitHub, Packagist, wp.org. Both the rate-limit logic and the allowlist live in the same hook, since <code>cors-proxy.php</code> only calls <code>playground_cors_proxy_maybe_rate_limit()</code> once. The example below shows just the allowlist concern; in practice you stack both in one function inside <code>cors-proxy-config.php</code>.</p> - -<!-- snippet: -filename: cors-proxy-config-allowlist.php -runnable: false ---> -```php -<?php -// cors-proxy-config.php — combine with the rate-limit example above. - -function playground_cors_proxy_maybe_rate_limit() { - $allow = array( - 'api.github.com', - 'raw.githubusercontent.com', - 'codeload.github.com', - 'repo.packagist.org', - 'downloads.wordpress.org', - 'api.wordpress.org', - ); - - $target = isset( $_SERVER['PATH_INFO'] ) ? $_SERVER['PATH_INFO'] : ( '/' . ( isset( $_SERVER['QUERY_STRING'] ) ? $_SERVER['QUERY_STRING'] : '' ) ); - $target = ltrim( $target, '/' ); - $host = parse_url( $target, PHP_URL_HOST ); - - if ( ! $host || ! in_array( strtolower( $host ), $allow, true ) ) { - http_response_code( 403 ); - header( 'Content-Type: text/plain' ); - echo "Upstream not allowed: " . ( $host ? $host : '(none)' ); - exit; - } -} - -echo "Allowlist config active.\n"; -``` - -## Browser-side fetch through the proxy - -<p>Once deployed, the client side is just <code>fetch()</code> with the proxy URL. Drop this into any HTML page.</p> - -<pre><code>const PROXY = "https://cors.example.com/cors-proxy.php"; - -async function viaProxy(url, init = {}) { - const res = await fetch(`${PROXY}/${url}`, { - ...init, - headers: { - ...(init.headers || {}), - "X-Cors-Proxy-Allowed-Request-Headers": "Authorization", - }, - }); - if (!res.ok) throw new Error(`Proxy returned ${res.status}`); - return res; -} - -const repo = await viaProxy("https://api.github.com/repos/WordPress/php-toolkit").then(r => r.json()); -console.log(repo.full_name, repo.stargazers_count); -</code></pre> - -## Deploy behind nginx - -<p>The proxy is a single PHP script — any SAPI works. nginx + php-fpm is a common production setup. <code>PATH_INFO</code> is what the proxy reads to learn the target URL.</p> - -<pre><code>server { - listen 443 ssl http2; - server_name cors.example.com; - - root /var/www/cors-proxy; - index cors-proxy.php; - - location ~ ^/cors-proxy\.php(/.*)?$ { - fastcgi_pass unix:/run/php/php8.1-fpm.sock; - fastcgi_split_path_info ^(.+\.php)(/.*)$; - fastcgi_param SCRIPT_FILENAME $document_root/cors-proxy.php; - fastcgi_param PATH_INFO $fastcgi_path_info; - include fastcgi_params; - } -} -</code></pre> diff --git a/bin/_docs_components/dataliberation.md b/bin/_docs_components/dataliberation.md deleted file mode 100644 index 6dd8b02d8..000000000 --- a/bin/_docs_components/dataliberation.md +++ /dev/null @@ -1,282 +0,0 @@ ---- -slug: dataliberation -title: DataLiberation -install: wp-php-toolkit/data-liberation - -see_also: ../learn/03-importing-content.html | Tutorial — Markdown to WXR | The chapter that walks through importing a folder of Markdown files into WordPress via the toolkit. -see_also: markdown | Markdown | Use Markdown as a source or destination format. -see_also: blockparser | BlockParser | Analyze serialized blocks inside post content. -see_also: httpclient | HttpClient | Download media and remote source data while importing. ---- - -Streaming WordPress import/export. WXR, SQL, block markup — without loading whole datasets into memory. - -## Why this exists - -<p>WordPress content should be portable, but real migrations cross several formats. A site export might arrive as WXR, a Markdown folder, or entities from another CMS. URLs can hide in block attributes, HTML, CSS, feeds, GUIDs, and post meta. Importers must also resume after a failed media download or upload.</p> - -<p>The DataLiberation component streams WordPress-shaped data through readers, transformers, and writers. It models posts, terms, comments, attachments, and metadata as <code>ImportEntity</code> objects, then lets a pipeline rewrite each entity without loading the full export into memory.</p> - -<p>The API reflects specific migration bugs: relative URLs in known block attributes, URLs inside inline CSS, self-closing block comments that must keep their shape, and origin-only URLs whose trailing slash style should not change during a rewrite.</p> - -<p>Reach for it when the job combines formats: build WXR from another CMS, rewrite a staging export for production, frontload remote assets, or compose Markdown, XML, HTML, CSS, and URL rewriting into one pipeline.</p> - -## Write a WXR file in five lines - -<p>Stream a single post into a WXR document via <code>WXRWriter</code>. The writer holds no buffer beyond what is needed to close currently-open tags, so memory stays flat regardless of input size.</p> - -<!-- snippet: -filename: wxr-quickstart.php -runnable: true ---> -```php -<?php -require '/wordpress/wp-content/php-toolkit/vendor/autoload.php'; - -use WordPress\ByteStream\MemoryPipe; -use WordPress\DataLiberation\EntityWriter\WXRWriter; -use WordPress\DataLiberation\ImportEntity; - -$pipe = new MemoryPipe(); -$writer = new WXRWriter( $pipe ); -$writer->append_entity( new ImportEntity( 'post', array( - 'post_title' => 'Hello', - 'content' => 'World.', - 'post_id' => '1', - 'status' => 'publish', -) ) ); -$writer->finalize(); -$writer->close_writing(); -$pipe->close_writing(); -$wxr = $pipe->consume_all(); - -echo "bytes: " . strlen( $wxr ) . "\n"; -echo false !== strpos( $wxr, '<title>Hello' ) ? "title exported\n" : "title missing\n"; -echo false !== strpos( $wxr, 'publish' ) ? "status exported\n" : "status missing\n"; -``` - - -``` -bytes: 475 -title exported -status exported -``` - -## Build a WXR programmatically from any source - -

The writer doesn't care where entities come from. Loop over rows from a CMS, a CSV, or a Notion API dump and emit posts plus their meta and comments.

- - -```php - 10, 'title' => 'About', 'body' => '

About us.

', 'tags' => array( 'company' ) ), - array( 'id' => 11, 'title' => 'Blog', 'body' => '

Hello world.

', 'tags' => array( 'news', 'launch' ) ), -); - -$pipe = new MemoryPipe(); -$writer = new WXRWriter( $pipe ); - -foreach ( $rows as $row ) { - $writer->append_entity( new ImportEntity( 'post', array( - 'post_id' => (string) $row['id'], - 'post_title' => $row['title'], - 'content' => $row['body'], - 'status' => 'publish', - 'post_type' => 'post', - ) ) ); - foreach ( $row['tags'] as $i => $tag ) { - $writer->append_entity( new ImportEntity( 'term', array( - 'term_id' => (string) ( $row['id'] * 100 + $i ), - 'taxonomy' => 'post_tag', - 'slug' => $tag, - 'parent' => '0', - ) ) ); - } -} - -$writer->finalize(); -$writer->close_writing(); -$pipe->close_writing(); - -$wxr = $pipe->consume_all(); -echo "items: " . substr_count( $wxr, '' ) . "\n"; -echo "terms: " . substr_count( $wxr, '' ) . "\n"; -echo false !== strpos( $wxr, 'Blog' ) ? "Blog post exported\n" : "Blog post missing\n"; -``` - - -``` -items: 2 -terms: 3 -Blog post exported -``` - -## Read entities from a WXR file with constant memory - -

WXREntityReader emits one entity at a time. A 10 GB WXR uses the same memory as a 10 KB one.

- - -```php - - - -Demo -First1postBody 1 -Second2postBody 2 - - -XML; - -$reader = WXREntityReader::create(); -$reader->append_bytes( $wxr ); -$reader->input_finished(); - -while ( $reader->next_entity() ) { - $entity = $reader->get_entity(); - echo $entity->get_type() . ': ' . json_encode( $entity->get_data() ) . "\n"; -} -``` - - -``` -site_option: {"option_name":"blogname","option_value":"Demo"} -post: {"post_title":"First","post_id":"1","post_type":"post","post_content":"Body 1"} -post: {"post_title":"Second","post_id":"2","post_type":"post","post_content":"Body 2"} -``` - -## Streaming transform: rewrite URLs while copying WXR - -

Wire reader to writer to rewrite a WXR file on the fly. This pattern is how you migrate a staging export to production: swap staging.example.com for example.com without ever loading the file into memory.

- - -```php - - - -Hello1post -Visit https://staging.example.com/about for more. - - -XML; - -$reader = WXREntityReader::create(); -$reader->append_bytes( $source_xml ); -$reader->input_finished(); - -$out_pipe = new MemoryPipe(); -$writer = new WXRWriter( $out_pipe ); - -while ( $reader->next_entity() ) { - $entity = $reader->get_entity(); - $data = $entity->get_data(); - foreach ( array( 'post_content', 'content', 'description' ) as $field ) { - if ( isset( $data[ $field ] ) ) { - $data[ $field ] = str_replace( 'staging.example.com', 'example.com', $data[ $field ] ); - } - } - if ( 'post' === $entity->get_type() ) { - $data['content'] = isset( $data['post_content'] ) ? $data['post_content'] : ( isset( $data['content'] ) ? $data['content'] : '' ); - } - $writer->append_entity( new ImportEntity( $entity->get_type(), $data ) ); -} - -$writer->finalize(); -$writer->close_writing(); -$out_pipe->close_writing(); - -$wxr = $out_pipe->consume_all(); -echo false !== strpos( $wxr, 'https://example.com/about' ) ? "new URL present\n" : "new URL missing\n"; -echo false === strpos( $wxr, 'staging.example.com' ) ? "old URL removed\n" : "old URL still present\n"; -``` - - -``` -new URL present -old URL removed -``` - -## Render Markdown into a WXR import in one pipeline - -

Compose MarkdownConsumer with WXRWriter to publish a folder of Markdown directly as a WordPress import file.

- - -```php -consume(); - $writer->append_entity( new ImportEntity( 'post', array( - 'post_id' => (string) $id++, - 'post_title' => $consumer->get_meta_value( 'title' ) ?: basename( $path, '.md' ), - 'content' => $consumer->get_block_markup(), - 'status' => 'publish', - 'post_type' => 'post', - 'post_name' => basename( $path, '.md' ), - ) ) ); -} - -$writer->finalize(); -$writer->close_writing(); -$pipe->close_writing(); - -$wxr = $pipe->consume_all(); -echo "posts: " . substr_count( $wxr, '' ) . "\n"; -echo false !== strpos( $wxr, '<!-- wp:heading' ) ? "block markup exported\n" : "block markup missing\n"; -echo false !== strpos( $wxr, 'Second' ) ? "frontmatter title exported\n" : "frontmatter title missing\n"; -``` - - -``` -posts: 2 -block markup exported -frontmatter title exported -``` diff --git a/bin/_docs_components/encoding.md b/bin/_docs_components/encoding.md deleted file mode 100644 index e963c6d83..000000000 --- a/bin/_docs_components/encoding.md +++ /dev/null @@ -1,196 +0,0 @@ ---- -slug: encoding -title: Encoding -install: wp-php-toolkit/encoding - -see_also: html | HTML | Normalize incoming text before HTML tokenization. -see_also: xml | XML | Keep invalid bytes out of XML streams. -see_also: dataliberation | DataLiberation | Clean content before importing it into WordPress. ---- - -UTF-8 validation and scrubbing with a pure-PHP fallback when mbstring is unavailable. Detects malformed bytes and replaces them per the Unicode maximal-subpart algorithm. - -## Why this exists - -

Every parser in this toolkit eventually has to decide what to do with text bytes. XML rejects malformed UTF-8. JSON and databases can fail late. CSS, HTML, WXR, and Blueprint validation all need consistent answers about whether a string is well-formed Unicode.

- -

The Encoding component provides the small UTF-8 primitives the rest of the toolkit can share: validate bytes, scrub invalid sequences, scan code points, and detect Unicode noncharacters. When mbstring is available it can delegate to it; when it is not, the component uses its own byte scanner so behavior stays available in restricted PHP environments.

- -

Historically, this became the common foundation for Blueprint validation and CSS/XML processing, replacing ad hoc Unicode helpers with the WordPress core UTF-8 routines used here.

- -## Validating UTF-8 before storing it - -

wp_is_valid_utf8() rejects overlong sequences, surrogate halves, and stray ISO-8859-1 bytes. Use it as a guard in front of any code path that assumes UTF-8 (database, JSON, XML).

- - -```php - 'just a test', - 'UTF-8 pencil' => "\xE2\x9C\x8F", - 'latin-1 byte' => "B\xFCch", - 'overlong slash' => "\xC1\xBF", - 'surrogate half' => "\xED\xB0\x80", -); - -foreach ( $samples as $label => $bytes ) { - echo sprintf( "%-14s %s\n", $label . ':', wp_is_valid_utf8( $bytes ) ? 'valid' : 'invalid' ); -} -``` - - -``` -ASCII: valid -UTF-8 pencil: valid -latin-1 byte: invalid -overlong slash: invalid -surrogate half: invalid -``` - -## Scrubbing invalid bytes with U+FFFD - -

Replace each ill-formed sequence with the Unicode replacement character. Useful right before serializing to XML, JSON, or sending to an LLM that will choke on broken bytes.

- - -```php - -``` -the byte � should not be here. -.��. -``` - -## Detecting noncharacters MySQL/utf8mb4 will reject - -

Code points like U+FFFE, U+FFFF, and the U+FDD0–U+FDEF block are valid Unicode but forbidden in XML and rejected by some databases. Check before inserting user-submitted content into a strict utf8mb4 column.

- - -```php - 'normal text', - 'U+FFFE' => "oops \u{FFFE}", - 'U+FDD0' => "hi \u{FDD0} bye", -); - -foreach ( $samples as $label => $text ) { - echo sprintf( "%-12s %s\n", $label . ':', wp_has_noncharacters( $text ) ? 'reject' : 'ok' ); -} -``` - - -``` -normal text: ok -U+FFFE: reject -U+FDD0: reject -``` - -## Three-way pipeline: validate, scrub, then check noncharacters - -

Real-world inputs are messy: an old WXR export, a CSV with mixed encodings, a paste from Word. Combination of validate + scrub + noncharacter-check covers the three classes of breakage that bite later.

- - -```php - 'Café', - 'latin1' => "caf\xE9", - 'overlong' => "x\xC1\xBFy", - 'noncharac' => "hi \u{FFFE} there", -); - -foreach ( $inputs as $label => $bytes ) { - $valid = wp_is_valid_utf8( $bytes ); - $cleaned = wp_scrub_utf8( $bytes ); - $weird = wp_has_noncharacters( $cleaned ); - echo sprintf( "%-10s valid=%s noncharacter=%s -> %s\n", $label, $valid ? 'Y' : 'N', $weird ? 'Y' : 'N', $cleaned ); -} -``` - - -``` -good valid=Y noncharacter=N -> Café -latin1 valid=N noncharacter=N -> caf� -overlong valid=N noncharacter=N -> x��y -noncharac valid=Y noncharacter=Y -> hi ￾ there -``` - -## Salvaging a legacy ISO-8859-1 column inside a UTF-8 corpus - -

Old WordPress databases sometimes mix encodings: most rows are UTF-8 but a few were stored as latin-1. Detect the bad rows with wp_is_valid_utf8() and only re-encode those.

- - -```php - 'Plain ASCII', - 2 => 'Café', - 3 => "caf\xE9", - 4 => "weird \xC0 byte", -); - -foreach ( $rows as $id => $value ) { - if ( wp_is_valid_utf8( $value ) ) { - echo "#$id ok: $value\n"; - continue; - } - $converted = @iconv( 'ISO-8859-1', 'UTF-8', $value ); - if ( false !== $converted && wp_is_valid_utf8( $converted ) ) { - echo "#$id recovered as latin1: $converted\n"; - } else { - echo "#$id unrecoverable, scrubbing: " . wp_scrub_utf8( $value ) . "\n"; - } -} -``` - - -``` -#1 ok: Plain ASCII -#2 ok: Café -#3 recovered as latin1: café -#4 recovered as latin1: weird À byte -``` diff --git a/bin/_docs_components/filesystem.md b/bin/_docs_components/filesystem.md deleted file mode 100644 index 492dfee29..000000000 --- a/bin/_docs_components/filesystem.md +++ /dev/null @@ -1,260 +0,0 @@ ---- -slug: filesystem -title: Filesystem -install: wp-php-toolkit/filesystem - -see_also: bytestream | ByteStream | Open files as readers and writers instead of loading full strings. -see_also: zip | Zip | Mount archives and copy data between archive-backed and normal filesystems. -see_also: git | Git | Expose repository trees through a filesystem-shaped API. ---- - -One Filesystem interface across local disk, in-memory trees, SQLite databases, and ZIP archives. Forward-slash paths everywhere — even on Windows — so the same code runs in tests, in production, and inside read-only ZIPs. - -## Why this exists - -

Code that touches the filesystem is hard to test, hard to port to Windows, and impossible to point at non-disk storage without rewriting it. Swap LocalFilesystem for InMemoryFilesystem in tests and your suite stops touching /tmp; swap it for SQLiteFilesystem and your "files" become rows in a portable database; swap it for ZipFilesystem and you can read inside an archive with the same calls.

- -

Every backend uses forward slashes regardless of host OS. No DIRECTORY_SEPARATOR juggling, no Windows-only test failures, no surprises when a path moves between backends.

- -## In-memory tree - -

The fastest backend. No disk I/O, no cleanup, no test-isolation problems.

- - -```php -put_contents( '/hello.txt', 'Hello, world!' ); -echo $fs->get_contents( '/hello.txt' ); -``` - - -``` -Hello, world! -``` - -## Test code without touching disk - -

Code that takes a Filesystem parameter, instead of calling file_get_contents() directly, can be tested against an InMemoryFilesystem. The test sets up files in memory, exercises the function, and asserts on what got written — no temp directories, no cleanup.

- - -```php -get_contents( $path ), true ); - list( $maj, $min, $patch ) = explode( '.', $json['version'] ); - $json['version'] = $maj . '.' . $min . '.' . ( (int) $patch + 1 ); - $fs->put_contents( $path, json_encode( $json ) ); -} - -$fs = InMemoryFilesystem::create(); -$fs->put_contents( '/package.json', '{"version":"1.2.3"}' ); -bump_version( $fs, '/package.json' ); - -echo $fs->get_contents( '/package.json' ) . "\n"; -``` - - -``` -{"version":"1.2.4"} -``` - -## Local disk with a chrooted root - -

LocalFilesystem::create($root) is implicitly chrooted: every path resolves relative to $root and a ../ cannot escape. Reach for it when a request path or CLI argument names a file inside one project directory.

- - -```php -mkdir( '/uploads', array( 'recursive' => true ) ); -$fs->put_contents( '/uploads/note.txt', 'Hi from local disk.' ); - -echo $fs->get_contents( '/uploads/../uploads/note.txt' ) . "\n"; - -$fs->rmdir( '/', array( 'recursive' => true ) ); -echo "exists after cleanup? " . ( is_dir( $root ) ? 'yes' : 'no' ) . "\n"; -``` - - -``` -Hi from local disk. -exists after cleanup? no -``` - -## SQLite as a portable file store - -

The whole tree lives in one SQLite database file. Use it for self-contained scratch storage that survives process boundaries without leaving loose files behind.

- - -```php -mkdir( '/posts', array( 'recursive' => true ) ); -for ( $i = 1; $i <= 3; $i++ ) { - $fs->put_contents( "/posts/post-{$i}.md", "# Post {$i}\n\nBody {$i}." ); -} - -foreach ( $fs->ls( '/posts' ) as $name ) { - $first = strtok( $fs->get_contents( '/posts/' . $name ), "\n" ); - echo "{$name}: {$first}\n"; -} -``` - - -``` -post-1.md: # Post 1 -post-2.md: # Post 2 -post-3.md: # Post 3 -``` - -## Copy a tree across backends - -

The killer composability move: copy_between_filesystems() streams files chunk-by-chunk from any source to any target. Pull a ZIP into SQLite, snapshot SQLite to disk, mirror disk into RAM — all the same call.

- - -```php -mkdir( '/site/posts', array( 'recursive' => true ) ); -$local->put_contents( '/site/posts/2024-01.md', '# Hello 2024' ); -$local->put_contents( '/site/index.html', '

Home

' ); - -$sqlite = SQLiteFilesystem::create( ':memory:' ); -copy_between_filesystems( array( - 'source_filesystem' => $local, - 'source_path' => '/site', - 'target_filesystem' => $sqlite, - 'target_path' => '/snapshot', -) ); - -$mem = InMemoryFilesystem::create(); -copy_between_filesystems( array( - 'source_filesystem' => $sqlite, - 'source_path' => '/snapshot', - 'target_filesystem' => $mem, - 'target_path' => '/copy', -) ); - -echo "in memory after two copies:\n"; -echo " posts: " . implode( ', ', $mem->ls( '/copy/posts' ) ) . "\n"; -echo " index: " . $mem->get_contents( '/copy/index.html' ) . "\n"; - -$local->rmdir( '/', array( 'recursive' => true ) ); -``` - - -``` -in memory after two copies: - posts: 2024-01.md - index:

Home

-``` - -## Atomic write via tempfile rename - -

Write to a sibling tempfile, then rename — that's how you avoid leaving a half-written file on crash. rename() is atomic within a single filesystem.

- - -```php -put_contents( $tmp, $bytes ); - $fs->rename( $tmp, $path ); -} - -$root = sys_get_temp_dir() . '/atomic-' . uniqid(); -$fs = LocalFilesystem::create( $root ); - -$fs->put_contents( '/config.json', '{"v":1}' ); -atomic_put_contents( $fs, '/config.json', '{"v":2}' ); - -echo "config: " . $fs->get_contents( '/config.json' ) . "\n"; -echo "no .tmp leftovers: " . count( $fs->ls( '/' ) ) . " entries in root\n"; - -$fs->rmdir( '/', array( 'recursive' => true ) ); -``` - - -``` -config: {"v":2} -no .tmp leftovers: 1 entries in root -``` - -## Path helpers that behave the same on Windows - -

Unix path semantics apply on every host OS. This matters for abstract paths such as a SQLite key or a ZIP entry name because those paths do not live on a real drive.

- - -```php - -``` -/var/www/site/index.php -/a/b -a/c/e -``` diff --git a/bin/_docs_components/git.md b/bin/_docs_components/git.md deleted file mode 100644 index 58076fb53..000000000 --- a/bin/_docs_components/git.md +++ /dev/null @@ -1,273 +0,0 @@ ---- -slug: git -title: Git -install: wp-php-toolkit/git - -see_also: filesystem | Filesystem | Work with repository trees through a storage abstraction. -see_also: merge | Merge | Resolve divergent histories with explicit three-way merge logic. -see_also: bytestream | ByteStream | Read and write object data without accidental buffering. ---- - -A pure-PHP Git client and server. Commits, branches, diffs, HTTP push/pull — all without shelling out to git. - -## Why this exists - -

Git is a useful storage model even when a server cannot run the git binary: snapshots, branches, object-addressed files, diffs, merges, and sync over HTTP. That matters for WordPress tools that want revision history for generated files, content snapshots, site state, or collaborative edits in constrained runtimes.

- -

The Git component implements the core repository operations in PHP and stores objects through the toolkit Filesystem interface. That means the same repository can live on disk, in memory, or in another backend, and higher-level code can commit files without knowing where objects are stored.

- -

The docs start with simple commits because that mental model scales: a repository is just objects plus refs. From there, branches, history walking, root commits, and merges become details you can reason about instead of magic shell behavior.

- -

Choose it for tests, browser-like sandboxes, hosted WordPress environments, and applications that need Git behavior through PHP APIs instead of shell commands.

- -## Commit files into an in-memory repo - -

The simplest possible repository: an InMemoryFilesystem as object storage and one commit() call. Reach for this in tests, in WP-CLI snapshots, or any place you want versioning without touching disk.

- - -```php -commit( array( - 'updates' => array( - 'README.md' => "# My Project\n", - 'src/hello-world.php' => 'get_branch_tip( 'HEAD' ) . "\n"; -echo "README: " . $repo->read_object_by_path( '/README.md' )->consume_all(); -``` - - -``` -commit: -HEAD: -README: # My Project -``` - -## Walk the commit history - -

Follow the parent chain from HEAD backwards. Building block for a WP-CLI "post revisions" log or a "what changed since release X" report.

- - -```php - $msg ) { - $repo->commit( array( - 'updates' => array( 'post.md' => "# Draft {$i}" ), - 'commit' => array( 'message' => $msg ), - ) ); -} - -$oid = $repo->get_branch_tip( 'HEAD' ); -while ( ! Commit::is_null_hash( $oid ) ) { - $c = $repo->read_object( $oid )->as_commit(); - echo substr( $c->hash, 0, 7 ) . ' ' . trim( $c->message ) . "\n"; - $oid = $c->get_first_parent_hash(); - if ( ! $oid || ! $repo->has_object( $oid ) ) break; -} -``` - - -``` - expand examples - fix typo - add intro -``` - -## Treat a repository like a filesystem - -

GitFilesystem wraps a repository in this toolkit's Filesystem interface. With the default options, each put_contents() records a new commit.

- - -```php -put_contents( '/posts/hello.md', "# Hello\nFirst draft." ); -$fs->put_contents( '/posts/about.md', "# About\nWho we are." ); -$fs->put_contents( '/posts/hello.md', "# Hello\nSecond draft." ); - -echo "tree:\n"; -foreach ( $fs->ls( '/posts' ) as $name ) { - echo " /posts/{$name}\n"; -} -echo "\nhello.md now:\n" . $fs->get_contents( '/posts/hello.md' ) . "\n"; -``` - - -``` -tree: - /posts/about.md - /posts/hello.md - -hello.md now: -# Hello -Second draft. -``` - -## Branch, edit, and switch back - -

Create a feature branch off the current commit, change files, flip HEAD back. Useful for experimental edits in collaborative tools.

- - -```php -commit( array( - 'updates' => array( 'config.json' => '{"flag":false}' ), - 'commit' => array( 'message' => 'baseline' ), -) ); - -$repo->create_branch( 'refs/heads/experiment', $base ); -$repo->checkout( 'refs/heads/experiment' ); -$repo->commit( array( - 'updates' => array( 'config.json' => '{"flag":true}' ), - 'commit' => array( 'message' => 'flip the flag' ), -) ); - -echo "on experiment: " . $repo->read_object_by_path( '/config.json' )->consume_all() . "\n"; - -$repo->checkout( 'refs/heads/trunk' ); -echo "on trunk: " . $repo->read_object_by_path( '/config.json' )->consume_all() . "\n"; -``` - - -``` -on experiment: {"flag":true} -on trunk: {"flag":false} -``` - -## Three-way merge two branches - -

The classic Git workflow: branch off, edit on each side, merge. $repo->merge() finds the common ancestor, three-way-merges every file, and creates a merge commit.

- - -```php -commit( array( 'updates' => array( - 'todo.txt' => "buy milk\nwalk dog\nread book\n", -) ) ); - -$repo->commit( array( 'updates' => array( - 'todo.txt' => "buy oat milk\nwalk dog\nread book\n", -) ) ); - -$repo->create_branch( 'refs/heads/feature', $base ); -$repo->checkout( 'refs/heads/feature' ); -$repo->commit( array( 'updates' => array( - 'todo.txt' => "buy milk\nwalk dog\nread book\nwrite blog post\n", -) ) ); - -$repo->checkout( 'refs/heads/trunk' ); -$result = $repo->merge( 'refs/heads/feature' ); - -echo "merge head: {$result['new_head']}\n"; -echo "conflicts: " . ( $result['conflicts'] ? implode( ',', $result['conflicts'] ) : 'none' ) . "\n"; -echo "result:\n" . $repo->read_object_by_path( '/todo.txt' )->consume_all(); -``` - - -``` -merge head: -conflicts: none -result: -buy oat milk -walk dog -read book -write blog post -``` - -## Snapshot WordPress options into a repo - -

Serialize a chunk of WP state (options, post meta, a theme config) on every save and commit it. You get free history, diffs between snapshots, and a "rollback to last week" button.

- - -```php - 'My Site', 'posts_per_page' => 10, 'timezone_string' => 'UTC' ), - array( 'blogname' => 'My Site', 'posts_per_page' => 20, 'timezone_string' => 'UTC' ), - array( 'blogname' => 'New Name', 'posts_per_page' => 20, 'timezone_string' => 'Europe/Warsaw' ), -); - -foreach ( $snapshots as $i => $options ) { - $repo->commit( array( - 'updates' => array( 'options.json' => json_encode( $options, JSON_PRETTY_PRINT ) ), - 'commit' => array( 'message' => "snapshot #{$i}" ), - ) ); -} - -$head = $repo->get_branch_tip( 'HEAD' ); -$parent = $repo->read_object( $head )->as_commit()->get_first_parent_hash(); -$diff = $repo->diff_commits( $head, $parent ); - -echo "Files changed in last snapshot:\n"; -foreach ( $diff as $name => $entry ) { - echo " {$name}\n"; -} -``` - - -``` -Files changed in last snapshot: - options.json -``` diff --git a/bin/_docs_components/html.md b/bin/_docs_components/html.md deleted file mode 100644 index b2aa2c50f..000000000 --- a/bin/_docs_components/html.md +++ /dev/null @@ -1,414 +0,0 @@ ---- -slug: html -title: HTML -install: wp-php-toolkit/html - -credit_title: Ported from WordPress core -credit_body: | - The HTML component is a port of WordPress core's WP_HTML_Tag_Processor and WP_HTML_Processor. Source: WordPress/wordpress-develop. Bug fixes flow in both directions. - -see_also: ../learn/01-rewriting-html.html | Tutorial — Rewriting HTML safely | The chapter that introduces the cursor model and the clean_post_html() function reused later in the importer. -see_also: blockparser | BlockParser | Parse block comments first, then rewrite the HTML inside each block. -see_also: markdown | Markdown | Convert Markdown to blocks before polishing generated HTML. -see_also: dataliberation | DataLiberation | Rewrite URLs and media references during import/export pipelines. ---- - -A pure-PHP HTML5 parser and tag rewriter mirroring WordPress core's HTML API. Treat HTML the way browsers do — without libxml2, DOMDocument, or regex hacks — and rewrite attributes in a single linear pass. - -## Why this exists - -

WordPress runs HTML fragments through filters every time a request renders: post content, block markup, comments, excerpts, widgets, feeds, imported documents. Those fragments can omit <html> and <body>, close tags implicitly, or mix browser-correct markup with author mistakes that DOMDocument and regular expressions do not model well.

- -

The HTML component gives WordPress-style code the same parsing model WordPress core uses: a browser-compatible tokenizer and tree-aware processor that run in pure PHP. Choose it for exact-byte rewrites, imperfect fragments, and post-content filters where a full DOM would do too much work.

- -

The component gives you two processors. WP_HTML_Tag_Processor is a forward-only cursor over tags and tokens — useful for attribute rewriting at scale. WP_HTML_Processor layers HTML5 tree construction on top so you can query by ancestry (breadcrumbs), serialize the parsed document, and trust that <p>one<p>two parses as two paragraphs the way a browser sees it.

- -

Footgun: Mutations are buffered. Nothing changes in the source string until you call get_updated_html(). If you read get_attribute() after a set_attribute() on the same tag, you see the new value — but downstream tooling reading the original string sees stale HTML until you serialize.

- -## Add loading="lazy" to every image - -

The "hello world" of tag rewriting. One linear pass, no DOM, no reserialization cost beyond the bytes you actually changed.

- -

Try this: click Run, then change 'lazy' to 'eager' on the first image only by guarding it with $tags->get_attribute( 'src' ) === 'hero.jpg'. Run again and notice that get_updated_html() only rewrites the bytes for that one tag.

- - -```php - - Hero -

Intro copy.

- Inline - -HTML; - -$tags = new WP_HTML_Tag_Processor( $html ); -while ( $tags->next_tag( 'img' ) ) { - // Don't clobber an explicit eager hint the author already set. - if ( null === $tags->get_attribute( 'loading' ) ) { - $tags->set_attribute( 'loading', 'lazy' ); - } - $tags->set_attribute( 'decoding', 'async' ); -} - -echo $tags->get_updated_html(); -``` - - -``` -
- Hero -

Intro copy.

- Inline -
-``` - -## Rewrite relative links to absolute URLs - -

Use this before sending post content to an RSS feed, an email template, or a CDN-backed copy of a site. The processor rewrites only the changed bytes, so untouched markup stays byte-identical.

- - -```php -See about, x, -and contact.

-HTML; - -$base = 'https://my-site.test/'; - -$tags = new WP_HTML_Tag_Processor( $html ); -while ( $tags->next_tag( 'a' ) ) { - $href = $tags->get_attribute( 'href' ); - if ( null === $href || '' === $href ) { - continue; - } - if ( preg_match( '#^[a-z][a-z0-9+.-]*:#i', $href ) || 0 === strpos( $href, '//' ) || 0 === strpos( $href, '#' ) ) { - continue; - } - $tags->set_attribute( 'href', rtrim( $base, '/' ) . '/' . ltrim( $href, '/' ) ); -} - -echo $tags->get_updated_html(); -``` - - -``` -

See about, x, -and contact.

-``` - -## Strip every script and inline event handler - -

A common sanitization step: neutralize untrusted HTML before display. Blank a script's body with set_modifiable_text() and strip every on* attribute via get_attribute_names_with_prefix().

- - -```php -hi

- - -HTML; - -$tags = new WP_HTML_Tag_Processor( $untrusted ); -while ( $tags->next_tag() ) { - // next_tag() never lands on closing tags, so no is_tag_closer() guard - // is needed here. - if ( 'SCRIPT' === $tags->get_tag() ) { - $tags->set_modifiable_text( '' ); - } - foreach ( $tags->get_attribute_names_with_prefix( 'on' ) as $attr ) { - $tags->remove_attribute( $attr ); - } -} - -echo $tags->get_updated_html(); -``` - - -``` -

hi

- - -``` - -## Stamp a CSP nonce on inline scripts and styles - -

Content Security Policy in nonce- mode requires every inline <script> and <style> to carry a matching nonce attribute. Tag-by-tag is exactly the right granularity.

- - -```php - - -HTML; - -$tags = new WP_HTML_Tag_Processor( $html ); -while ( $tags->next_tag() ) { - $tag = $tags->get_tag(); - if ( 'SCRIPT' === $tag || 'STYLE' === $tag ) { - $tags->set_attribute( 'nonce', $nonce ); - } -} - -echo "nonce: {$nonce}\n\n"; -echo $tags->get_updated_html(); -``` - - -``` -nonce: - - - -``` - -## Build a srcset from a single src - -

Generate responsive image markup at render time without touching the editor data model. Read the existing src, derive a srcset with width descriptors, add a sizes hint.

- - -```php -Sunset'; -$widths = array( 480, 768, 1200 ); - -$tags = new WP_HTML_Tag_Processor( $html ); -while ( $tags->next_tag( 'img' ) ) { - $src = $tags->get_attribute( 'src' ); - if ( null === $src || $tags->get_attribute( 'srcset' ) !== null ) { - continue; - } - $variants = array(); - foreach ( $widths as $w ) { - $variants[] = $src . '?w=' . $w . ' ' . $w . 'w'; - } - $tags->set_attribute( 'srcset', implode( ', ', $variants ) ); - $tags->set_attribute( 'sizes', '(max-width: 768px) 100vw, 768px' ); -} - -echo $tags->get_updated_html(); -``` - - -``` -
Sunset
-``` - -## Decode HTML entities the way the spec demands - -

The HTML5 entity table has roughly 2,200 named references and a long list of edge cases. WP_HTML_Decoder implements the algorithm — don't roll your own.

- - -```php - -``` -attribute: path?a=1&b=2© -text: AT&T — 100% 😀 -bool(true) -``` - -## Find images by ancestry with breadcrumbs - -

The full WP_HTML_Processor understands HTML5 tree construction, so you can ask "find every <img> directly inside a <figure>" without writing your own DOM walker.

- - -```php - -
Hero
Hero shot
-

Body copy mid-paragraph.

-
Diagram
- -HTML; - -$p = WP_HTML_Processor::create_fragment( $html ); -$figure_images = 0; -while ( $p->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'IMG' ) ) ) ) { - $p->add_class( 'figure-image' ); - $figure_images++; -} - -echo "found {$figure_images} figure images\n"; -echo $p->get_updated_html(); -``` - - -``` -found 2 figure images -
-
Hero
Hero shot
-

Body copy mid-paragraph.

-
Diagram
-
-``` - -## Outline a document by walking tokens with depth - -

The full processor exposes get_current_depth() and get_breadcrumbs(). Combine with next_token() to print a structural outline.

- - -```php -

Title

-

Chapter 1

Body

-

Chapter 2

More body

- -HTML; - -$p = WP_HTML_Processor::create_fragment( $html ); -while ( $p->next_token() ) { - if ( '#tag' !== $p->get_token_type() || $p->is_tag_closer() ) { - continue; - } - $tag = $p->get_tag(); - if ( ! preg_match( '/^H[1-6]$/', $tag ) ) { - continue; - } - $indent = str_repeat( ' ', max( 0, $p->get_current_depth() - 2 ) ); - $text = ''; - while ( $p->next_token() ) { - if ( '#text' === $p->get_token_type() ) { - $text .= $p->get_modifiable_text(); - continue; - } - if ( '#tag' === $p->get_token_type() && $tag === $p->get_tag() && $p->is_tag_closer() ) { - break; - } - } - echo "{$indent}{$tag} {$text}\n"; -} -``` - - -``` - H1 Title - H2 Chapter 1 - H2 Chapter 2 -``` - -## Bookmarks: annotate a parent based on its children - -

Bookmarks are the one escape from forward-only scanning. Save a position, scan ahead, decide what to do, then seek() back and rewrite the earlier tag.

- - -```php - -
  • Buy milk
  • -
  • Walk the dog
  • -
  • Read book
  • - -HTML; - -$tags = new WP_HTML_Tag_Processor( $html ); -$tags->next_tag( 'ul' ); -$tags->set_bookmark( 'list' ); - -$total = 0; -$done = 0; -while ( $tags->next_tag( 'input' ) ) { - $total++; - if ( null !== $tags->get_attribute( 'checked' ) ) { - $done++; - } -} - -$tags->seek( 'list' ); -$tags->set_attribute( 'data-progress', $done . '/' . $total ); -$tags->release_bookmark( 'list' ); - -echo $tags->get_updated_html(); -``` - - -``` -
      -
    • Buy milk
    • -
    • Walk the dog
    • -
    • Read book
    • -
    -``` - -## When to use which - - - - - - - -
    UseFor
    WP_HTML_Tag_ProcessorAttribute rewriting, sanitization, finding tags by name. Forward-only walks. Anything where speed and byte-honesty matter more than context.
    WP_HTML_Processor::create_fragment()Queries by ancestry (breadcrumbs), heading outline extraction, anything that needs to know "is this tag inside that one."
    WP_HTML_Decoder::decode_text_node()Turning entity-encoded text (AT&amp;T) back into raw text correctly. Implements the HTML5 entity algorithm — don't roll your own.
    WP_HTML_Decoder::attribute_starts_with()Safe URL-prefix checks that decode HTML character references while comparing — so j&#x61;vascript: (where &#x61; is the letter a) is correctly recognized as starting with javascript:. The classic strpos approach misses these.
    - -

    Footgun: next_tag() only stops on opening tags. Closers and text are skipped, so a guard like ! $tags->is_tag_closer() inside a next_tag() loop is harmless but never fires. If you need to visit closing tags or text nodes, use next_token() instead and check get_token_type().

    - -

    Footgun: Tag-name matches are uppercase. get_tag() always returns the tag name in uppercase ('IMG', not 'img'). Compare accordingly. The filter argument to next_tag() is case-insensitive in either direction.

    - -

    Footgun: Don't confuse WP_HTML_Tag_Processor with the full processor. The cursor is forward-only and ancestry-blind, and it doesn't expose get_breadcrumbs() at all — calling that on a WP_HTML_Tag_Processor raises a Call to undefined method error. Breadcrumbs and HTML5 tree construction (implicit <tbody> insertion, automatic <p> closing, and the rest) live only on WP_HTML_Processor.

    diff --git a/bin/_docs_components/httpclient.md b/bin/_docs_components/httpclient.md deleted file mode 100644 index b62f1d1f2..000000000 --- a/bin/_docs_components/httpclient.md +++ /dev/null @@ -1,606 +0,0 @@ ---- -slug: httpclient -title: HttpClient -install: wp-php-toolkit/http-client - -see_also: ../learn/04-talking-to-the-network.html | Tutorial — Talking to the network | Walks through a streaming downloader that resumes, fans out, and pipes bytes to disk without buffering. -see_also: bytestream | ByteStream | Stream request and response bodies. -see_also: filesystem | Filesystem | Persist large downloads without buffering them in memory. -see_also: corsproxy | CORSProxy | Bridge browser-side tools to servers without CORS headers. ---- - -Async HTTP client without curl required. Uses sockets when curl is missing, supports concurrent requests and streaming responses. - -## Why this exists - -

    A plugin installer starts with one request to download plugin.zip. A migration then adds progress reporting, a ten-request media window, resumable downloads, and a remote ZIP reader that feeds ZipFilesystem directly. Those workflows need the same request API from the first GET to the final streamed archive.

    - -

    The HttpClient component gives the toolkit a small request/response model, middleware for redirects and caching, concurrent fetches, and response bodies exposed as byte streams. It runs through curl when PHP provides curl and through pure PHP sockets when it does not. Callers keep the same code path.

    - -

    Use it to fetch plugin metadata, submit import callbacks, mirror a media library, read a WXR export, or pipe a remote archive into Zip and Filesystem code.

    - -## GET a URL - -

    Network access in the demo runtime. Live request examples show the real API, but outbound HTTP in browser sandboxes may require a CORS proxy.

    - -

    The smallest flow has three steps: create a request, wait until headers arrive, then consume the body stream. This is intentionally close to the Fetch API shape, but the body is a toolkit byte stream instead of a buffered string.

    - - -```php -fetch( new Request( 'https://example.com/' ) ); - -$response = $stream->await_response(); -echo "status: " . $response->status_code . "\n"; -echo "first 80 bytes: " . substr( $stream->consume_all(), 0, 80 ) . "\n"; -``` - -## POST to a URL - -

    Uploads use the same shape. The only difference is that the request declares a method, request headers, and an upload body stream. Here the body is form-encoded text wrapped in MemoryPipe; a file upload could provide a file-backed read stream instead.

    - - -```php - 'Hello', - 'tags' => 'http,php', - ), - '', - '&' -); - -$client = new Client(); -$request = new Request( 'https://httpbin.org/post', array( - 'method' => 'POST', - 'headers' => array( - 'content-type' => 'application/x-www-form-urlencoded', - 'content-length' => (string) strlen( $payload ), - ), - 'body_stream' => new MemoryPipe( $payload ), -) ); - -$response = $client->fetch( $request )->json(); -echo "Server saw form title: " . $response['form']['title'] . "\n"; -``` - -## Build a JSON request object - -

    A Request is just data until a client enqueues it. That makes it easy to test request construction without network access. The constructor normalizes headers, calculates content-length when the body stream has a known length, and moves URL credentials into an Authorization header.

    - - -```php - 'Hello', - 'tags' => array( 'docs', 'php' ), -) ) ); -$body->close_writing(); - -$request = new Request( 'https://user:secret@api.example.test/posts', array( - 'method' => 'POST', - 'headers' => array( 'content-type' => 'application/json' ), - 'body_stream' => $body, -) ); - -echo $request->method . ' ' . $request->url . "\n"; -echo "content-type: " . $request->get_header( 'content-type' ) . "\n"; -echo "content-length: " . $request->get_header( 'content-length' ) . "\n"; -echo "authorization: " . substr( $request->get_header( 'authorization' ), 0, 10 ) . "...\n"; -``` - - -``` -POST https://api.example.test/posts -content-type: application/json -content-length: 39 -authorization: Basic dXNl... -``` - -## Parse response headers - -

    Most applications receive Response objects from await_response(). Transports, middleware, and tests sometimes need the lower-level parser: Response::from_http_headers() turns raw HTTP header bytes into normalized status and case-insensitive headers.

    - - -```php -status_code . ' ' . $response->get_reason_phrase() . "\n"; -echo "ok: " . ( $response->ok() ? 'yes' : 'no' ) . "\n"; -echo "type: " . $response->get_header( 'CONTENT-TYPE' ) . "\n"; -echo "size: " . $response->total_bytes . " bytes\n"; -``` - - -``` -status: 201 Created -ok: yes -type: application/json -size: 27 bytes -``` - -## Pick the right reading style - -

    There are three common ways to consume a response. Start simple, then move down the table only when the workflow demands it.

    - -
    StyleUse whenTradeoff
    consume_all() or json()Small HTML, JSON, or API responses.Buffers the full body.
    Client::await_next_event()Progress bars, streaming to disk, queues, failure handling.You own the event loop.
    Filesystem and parser compositionRemote ZIPs, WXR files, import pipelines.Requires a stream-aware consumer.
    - -## Choose a transport - -

    The transport is the I/O backend. It should not change your request, response, redirect, cache, or stream code; it only changes how bytes move across the network.

    - -
    TransportWhat it doesWhen to choose it
    autoUses curl when loaded, otherwise sockets.Application default. Best when you want portability and the fastest available backend.
    socketsUses PHP stream sockets, no curl extension.Tests, Playground-style runtimes, hosts where curl is unavailable, or proving the dependency-free path works.
    curlUses the curl extension.Hosts where curl is available and you want to compare behavior or performance explicitly.
    - -

    concurrency, timeout_ms, cache_dir, redirects, and response streaming sit above the transport, so the examples later on work with either backend.

    - - -```php - 'auto' ). - -$portable = new Client( array( - 'transport' => 'sockets', -) ); - -if ( extension_loaded( 'curl' ) ) { - $curl = new Client( array( - 'transport' => 'curl', - ) ); -} -``` - -## Follow redirects and inspect the final request - -

    Redirects are middleware, not transport behavior. The client follows up to five redirects by default. The original Request keeps a chain to the final request, so importers can log where a source URL actually landed.

    - - -```php -fetch( $request ); -$response = $stream->await_response(); -$stream->consume_all(); - -$final = $request->latest_redirect(); -echo "original: " . $request->url . "\n"; -echo "final: " . $final->url . "\n"; -echo "status: " . $response->status_code . "\n"; -``` - -## Cache repeatable GET responses - -

    Pass cache_dir to add disk caching for cacheable GET and HEAD responses. Fresh cached responses replay the same header/body events as a network response, so crawlers and importers do not need a separate cache code path. Non-GET requests invalidate matching cache entries instead of being cached.

    - - -```php - $cache_dir ) ); -$url = 'https://httpbin.org/cache/60'; - -for ( $i = 1; $i <= 2; $i++ ) { - $stream = $client->fetch( new Request( $url ) ); - $response = $stream->await_response(); - $body = $stream->consume_all(); - echo "request {$i}: HTTP " . $response->status_code . ', body=' . strlen( $body ) . " bytes\n"; -} - -echo "cache files: " . count( glob( $cache_dir . '/*' ) ) . "\n"; -``` - -## Handle failures without losing the queue - -

    Failures arrive as events. That lets a crawler, importer, package installer, or media frontloader log one bad URL and keep processing the rest of the queue. Treat failure handling as part of the event loop, not as one global try/catch around the whole batch.

    - - -```php - 5000 ) ); -$client->enqueue( array( - new Request( 'https://example.com/', array( 'method' => 'HEAD' ) ), - new Request( 'https://example.invalid/missing' ), -) ); - -while ( $client->await_next_event() ) { - $request = $client->get_request(); - $event = $client->get_event(); - - if ( Client::EVENT_GOT_HEADERS === $event ) { - echo "ok: " . $request->url . " HTTP " . $request->response->status_code . "\n"; - } elseif ( Client::EVENT_FAILED === $event ) { - echo "failed: " . $request->url . "\n"; - } elseif ( Client::EVENT_FINISHED === $event ) { - echo "finished: " . $request->url . "\n"; - } -} -``` - -## Monitor download progress - -

    When you care about progress, use the event loop directly. Count bytes from each EVENT_BODY_CHUNK_AVAILABLE event and compare them with Content-Length when the server provides one.

    - - -```php -enqueue( array( $request ) ); - -$downloaded = 0; -$last_step = -1; -@unlink( $dest ); - -while ( $client->await_next_event() ) { - $event = $client->get_event(); - $request = $client->get_request(); - - if ( Client::EVENT_GOT_HEADERS === $event ) { - echo "status: " . $request->response->status_code . "\n"; - continue; - } - - if ( Client::EVENT_BODY_CHUNK_AVAILABLE === $event ) { - $chunk = $client->get_response_body_chunk(); - $downloaded += strlen( $chunk ); - file_put_contents( $dest, $chunk, FILE_APPEND ); - - $total = $request->response->total_bytes; - if ( $total ) { - $step = min( 100, (int) floor( $downloaded / $total * 100 ) ); - if ( $step >= $last_step + 25 || 100 === $step ) { - echo "progress: {$step}% ({$downloaded}/{$total} bytes)\n"; - $last_step = $step; - } - } else { - echo "downloaded: {$downloaded} bytes\n"; - } - continue; - } - - if ( Client::EVENT_FINISHED === $event ) { - echo "saved: {$dest}\n"; - } elseif ( Client::EVENT_FAILED === $event ) { - echo "failed: " . $request->error->message . "\n"; - } -} -``` - -## Keep a sliding window of 10 requests - -

    For large queues, do not enqueue everything at once. Keep at most ten active requests, enqueue another as each one finishes, and let the client multiplex only that window.

    - - -```php - 10 ) ); -$pending = $urls; -$active = array(); -$done = 0; - -$enqueue_next = function () use ( &$pending, &$active, $client ) { - if ( ! $pending ) { - return; - } - $url = array_shift( $pending ); - $request = new Request( $url, array( 'method' => 'HEAD' ) ); - $active[ $request->id ] = $request; - $client->enqueue( array( $request ) ); -}; - -for ( $i = 0; $i < 10; $i++ ) { - $enqueue_next(); -} - -while ( $active && $client->await_next_event() ) { - $request = $client->get_request(); - $event = $client->get_event(); - - if ( Client::EVENT_GOT_HEADERS === $event ) { - echo "headers {$request->id}: " . $request->response->status_code . "\n"; - continue; - } - - if ( Client::EVENT_FINISHED === $event || Client::EVENT_FAILED === $event ) { - unset( $active[ $request->id ] ); - $done++; - echo "finished {$done}/25, active=" . count( $active ) . "\n"; - $enqueue_next(); - } -} -``` - -## Resume a partial download - -

    Resuming is an HTTP contract between you and the server. Save what you already have, send a Range request for the remaining bytes, and append only if the server returns 206 Partial Content.

    - - -```php - array( 'range' => 'bytes=0-32767' ), -) ); -$stream = $client->fetch( $first ); -$response = $stream->await_response(); -file_put_contents( $dest, $stream->consume_all() ); - -if ( 206 !== $response->status_code ) { - echo "Server did not honor Range; start over with a full download.\n"; - exit; -} - -$downloaded = filesize( $dest ); -echo "partial file: {$downloaded} bytes\n"; - -$resume = new Request( $url, array( - 'headers' => array( 'range' => 'bytes=' . $downloaded . '-' ), -) ); -$stream = $client->fetch( $resume ); -$response = $stream->await_response(); - -if ( 206 !== $response->status_code ) { - echo "Server did not resume; discard partial file and retry from byte 0.\n"; - exit; -} - -while ( ! $stream->reached_end_of_data() ) { - $n = $stream->pull( 8192 ); - if ( 0 === $n ) { - break; - } - file_put_contents( $dest, $stream->consume( $n ), FILE_APPEND ); -} - -echo "complete file: " . filesize( $dest ) . " bytes\n"; -echo "saved: {$dest}\n"; -``` - -## Stream-unzip a remote archive - -

    Mount the remote archive with ZipFilesystem, then copy it into any writable filesystem. SeekableRequestReadStream caches received bytes to a temporary file so ZipFilesystem can read the central directory and seek to entries without first writing the ZIP yourself.

    - - -```php - $client ) -); - -$response = $reader->await_response(); -if ( ! $response->ok() ) { - echo "HTTP " . $response->status_code . "\n"; - exit; -} - -$zip = ZipFilesystem::create( $reader ); -$local = LocalFilesystem::create( $root ); - -copy_between_filesystems( array( - 'source_filesystem' => $zip, - 'source_path' => '/', - 'target_filesystem' => $local, - 'target_path' => '/', -) ); - -$tree = ls_recursive( $local, '/' ); -$files = 0; -array_walk_recursive( $tree, function ( $value, $key ) use ( &$files ) { - if ( 'type' === $key && 'file' === $value ) { - $files++; - } -} ); - -echo "extracted {$files} files\n"; -echo "root: {$root}\n"; -``` - -## Parallel fan-out: fetch many URLs at once - -

    Enqueue a batch of requests and react to events as they fire. The client multiplexes them — total wall time is roughly the slowest request, not the sum.

    - - -```php -enqueue( array_map( function ( $url ) { - return new Request( $url, array( 'method' => 'HEAD' ) ); -}, $urls ) ); - -$results = array(); -while ( $client->await_next_event() ) { - $request = $client->get_request(); - if ( Client::EVENT_GOT_HEADERS === $client->get_event() ) { - $results[ $request->url ] = $request->response->status_code; - } elseif ( Client::EVENT_FAILED === $client->get_event() ) { - $results[ $request->url ] = 'ERR ' . $request->error->message; - } -} - -foreach ( $results as $url => $status ) { - printf( "%-40s %s\n", $url, $status ); -} -``` - -## Stream a download to disk without OOM - -

    Process the body chunk-by-chunk via the event loop. Memory stays flat regardless of file size.

    - - -```php -enqueue( array( new Request( 'https://wordpress.org/' ) ) ); - -$bytes = 0; -@unlink( $dest ); - -while ( $client->await_next_event() ) { - switch ( $client->get_event() ) { - case Client::EVENT_BODY_CHUNK_AVAILABLE: - $chunk = $client->get_response_body_chunk(); - $bytes += strlen( $chunk ); - file_put_contents( $dest, $chunk, FILE_APPEND ); - break; - case Client::EVENT_FINISHED: - echo "Wrote {$bytes} bytes to {$dest}\n"; - break; - } -} - -echo "Peak memory: " . round( memory_get_peak_usage( true ) / 1024 / 1024, 2 ) . " MB\n"; -``` diff --git a/bin/_docs_components/httpserver.md b/bin/_docs_components/httpserver.md deleted file mode 100644 index 2967dfd6a..000000000 --- a/bin/_docs_components/httpserver.md +++ /dev/null @@ -1,138 +0,0 @@ ---- -slug: httpserver -title: HttpServer -install: wp-php-toolkit/http-server - -see_also: cli | CLI | Expose a local browser UI from a command-line tool. -see_also: httpclient | HttpClient | Test client code against a small local fixture server. ---- - -A minimal blocking TCP HTTP server in pure PHP. For CLI tools and tests, not for production traffic. - -## Why this exists - -

    Sometimes a PHP tool needs a tiny local HTTP surface: a test fixture server, a webhook receiver during development, a CLI tool with a browser UI, or a demo endpoint for another component. Pulling in a production web framework would obscure the example and add dependencies the toolkit avoids.

    - -

    The HttpServer component is intentionally small: a blocking TCP server, incoming request objects, and response writers. It is useful for local tools and tests. It is not a replacement for nginx, Apache, php-fpm, RoadRunner, Swoole, or a production application server.

    - -## Hello world on port 8080 - -

    Run on your machine: the Playground sandbox does not allow processes to bind listening TCP ports. Save this snippet locally and run php hello-server.php.

    - - -```php -set_handler( function ( IncomingRequest $request, ResponseWriteStream $response ) { - $response->send_http_code( 200 ); - $response->send_header( 'Content-Type', 'text/plain' ); - $response->append_bytes( "Hello from " . $request->method . " " . $request->url . "\n" ); -} ); - -$server->serve( function ( $host, $port ) { - echo "Listening on http://{$host}:{$port}\n"; -} ); -``` - -## A tiny JSON router - -

    Run on your machine: needs a listening port. Once running, try curl localhost:8080/api/status.

    - -

    Build a CLI tool with a web UI by switching on the parsed path and method.

    - - -```php -set_handler( function ( IncomingRequest $request, ResponseWriteStream $response ) { - $path = $request->get_parsed_url()->pathname; - - if ( '/api/status' === $path ) { - $response->send_http_code( 200 ); - $response->send_header( 'Content-Type', 'application/json' ); - $response->append_bytes( json_encode( array( - 'ok' => true, - 'pid' => getmypid(), - 'memory' => memory_get_usage( true ), - ) ) ); - return; - } - - if ( '/api/echo' === $path && 'POST' === $request->method ) { - $body = ''; - while ( ! $request->body_stream->reached_end_of_data() ) { - $n = $request->body_stream->pull( 4096 ); - if ( $n > 0 ) $body .= $request->body_stream->consume( $n ); - } - $response->send_http_code( 200 ); - $response->send_header( 'Content-Type', 'text/plain' ); - $response->append_bytes( $body ); - return; - } - - $response->send_http_code( 404 ); - $response->append_bytes( "Not found\n" ); -} ); - -$server->serve(); -``` - -## Buffered response with auto Content-Length - -

    Use BufferingResponseWriter when you want the framework to compute Content-Length for you, or when the runtime is CGI-shaped and expects the full body up front. This one runs anywhere — no socket required.

    - - -```php -send_http_code( 200 ); -$writer->send_header( 'Content-Type', 'text/html' ); -$writer->append_bytes( 'Hi

    Hello

    ' ); -$writer->append_bytes( '

    Buffered body, sent at the end.

    ' ); - -ob_start(); -$writer->close_writing(); -$response_body = ob_get_clean(); - -echo "headers before send:\n"; -foreach ( $writer->get_buffered_headers() as $name => $value ) { - echo "{$name}: {$value}\n"; -} -echo "\nbody:\n" . $response_body; -``` - - -``` -headers before send: -Content-Type: text/html - -body: -Hi

    Hello

    Buffered body, sent at the end.

    -``` diff --git a/bin/_docs_components/markdown.md b/bin/_docs_components/markdown.md deleted file mode 100644 index 37f09baea..000000000 --- a/bin/_docs_components/markdown.md +++ /dev/null @@ -1,224 +0,0 @@ ---- -slug: markdown -title: Markdown -install: wp-php-toolkit/markdown - -credit_title: Built on league/commonmark -credit_body: | - Markdown parsing is delegated to league/commonmark; YAML frontmatter is handled by webuni/front-matter. The toolkit's own work is the bridge between CommonMark's AST and WordPress block markup, in both directions. - -see_also: blockparser | BlockParser | Understand the block tree created from Markdown output. -see_also: html | HTML | Rewrite rendered HTML fragments without using DOMDocument. -see_also: dataliberation | DataLiberation | Turn Markdown folders into import/export streams. ---- - -Bidirectional converter between Markdown and WordPress block markup. Useful for moving content between Markdown files and WordPress while preserving the structures both formats can express. - -## Why this exists - -

    Many publishing workflows start in Markdown: documentation sites, static-site generators, Git-backed editorial workflows, Obsidian vaults, and developer notes. WordPress stores editor content as block markup. Moving between those worlds by string replacement loses metadata and quickly breaks on lists, tables, code blocks, and frontmatter.

    - -

    The Markdown component provides a structured bridge. MarkdownConsumer turns Markdown plus frontmatter into block markup and metadata; MarkdownProducer turns supported block markup back into Markdown. The conversion is meant for practical content workflows, not byte-identical round-tripping of every custom block attribute.

    - -## Markdown to blocks - -

    Feed Markdown into MarkdownConsumer, get block markup back. The result is a BlocksWithMetadata object (defined in WordPress\DataLiberation\DataFormatConsumer — the shared shape every DataFormatConsumer in the toolkit emits) that holds both the rendered blocks and any frontmatter parsed from the document.

    - - -```php -consume(); -echo $result->get_block_markup(); -``` - - -``` - -

    Hello

    - - - -

    Welcome to WordPress.

    - -``` - -## Round-trip: blocks back to Markdown - -

    Pair MarkdownProducer with MarkdownConsumer to convert in either direction. Round-tripping is lossy for block attributes that have no Markdown representation (custom classes, alignment), so do not expect byte-perfect equality.

    - - -```php -consume(); -$markdown = ( new MarkdownProducer( $blocks ) )->produce(); - -echo $markdown; -``` - - -``` -## Round trip - -- one -- two -- three -``` - -## Reading YAML frontmatter as post meta - -

    Frontmatter keys come back as arrays so a single key can hold multiple values. Use get_meta_value() when you only want the first scalar.

    - - -```php -consume(); - -echo 'Title: ' . $consumer->get_meta_value( 'post_title' ) . "\n"; -echo 'Status: ' . $consumer->get_meta_value( 'post_status' ) . "\n"; -$metadata = $consumer->get_all_metadata(); -echo 'Tags: ' . implode( ', ', $metadata['tags'][0] ) . "\n"; -``` - - -``` -Title: The Name of the Wind -Status: publish -Tags: fantasy, kingkiller -``` - -## Migrating an Obsidian or Hugo folder of Markdown - -

    Walk a directory of .md files (Obsidian vault, Hugo content/, Jekyll _posts) and emit one block-markup record per file.

    - - -```php -consume(); - $title = $consumer->get_meta_value( 'title' ); - if ( ! $title ) $title = basename( $path, '.md' ); - echo "=== $title ($path) ===\n"; - echo substr( $consumer->get_block_markup(), 0, 120 ) . "...\n\n"; -} -``` - - -``` -=== roadmap (/tmp//roadmap.md) === - -

    Roadmap

    - - - -

    Hello world.

    - - -... -``` - -## Counting blocks produced by a Markdown document - -

    After conversion, the block markup is plain WordPress block markup, so parse_blocks() works on it directly. The standard way to introspect what the converter emitted before saving to the database.

    - - -````php - A quote. -MD; - -$blocks = ( new MarkdownConsumer( $md ) )->consume()->get_block_markup(); -$counts = array(); -$queue = parse_blocks( $blocks ); - -while ( $queue ) { - $block = array_shift( $queue ); - if ( null !== $block['blockName'] ) { - $name = $block['blockName']; - $counts[ $name ] = isset( $counts[ $name ] ) ? $counts[ $name ] + 1 : 1; - } - foreach ( $block['innerBlocks'] as $inner_block ) { - $queue[] = $inner_block; - } -} -foreach ( $counts as $name => $count ) { - echo "{$name}: {$count}\n"; -} -```` - - -``` -core/heading: 1 -core/paragraph: 2 -core/table: 1 -core/code: 1 -core/quote: 1 -``` diff --git a/bin/_docs_components/merge.md b/bin/_docs_components/merge.md deleted file mode 100644 index a6a582f0e..000000000 --- a/bin/_docs_components/merge.md +++ /dev/null @@ -1,247 +0,0 @@ ---- -slug: merge -title: Merge -install: wp-php-toolkit/merge - -see_also: git | Git | Merge file contents discovered through repository history. -see_also: markdown | Markdown | Resolve file-based editorial workflows before converting to blocks. -see_also: dataliberation | DataLiberation | Make content synchronization conflicts visible. ---- - -Three-way merge and diff. Pluggable differ + merger + optional validator. - -## Why this exists - -

    Content synchronization needs more than "last write wins." A Markdown file changes in Git while the same post changes in WordPress. A generated config changes through both a CLI tool and a UI. In those cases you need a common ancestor, two edited versions, and a way to explain conflicts to a human.

    - -

    The Merge component provides the diff and three-way merge primitives used by those workflows. The default examples are line-oriented because that is the most familiar shape, but the strategy is intentionally pluggable: choose the differ, choose the merger, and optionally validate the merged result before accepting it.

    - -

    Use the merge result to auto-accept independent edits and to show structured conflicts when a person must decide.

    - -## Diff two strings line by line - -

    Feed two strings to LineDiffer and inspect the operations. Every get_changes() entry is a [op, text] pair.

    - - -```php -diff( - "alpha\nbeta\ngamma\n", - "alpha\nBETA\ngamma\ndelta\n" -); - -$labels = array( Diff::DIFF_EQUAL => '=', Diff::DIFF_DELETE => '-', Diff::DIFF_INSERT => '+' ); -foreach ( $diff->get_changes() as $change ) { - echo $labels[ $change[0] ] . ' ' . rtrim( $change[1] ) . "\n"; -} -``` - - -``` -= alpha -- beta -+ BETA -= gamma -+ delta -= -``` - -## Render a unified patch - -

    format_as_git_patch() produces output that mirrors git diff, including hunk headers — handy for emails, CI annotations, or a "what changed?" panel.

    - - -```php -diff( $old, $new ); -echo $diff->format_as_git_patch( array( - 'a_source' => 'a/post.yml', - 'b_source' => 'b/post.yml', -) ); -``` - - -``` -diff --git a/post.yml b/post.yml ---- a/post.yml -+++ b/post.yml -@@ -1,4 +1,5 @@- title: Hello -+ title: Hello, world - author: Alice -- status: draft -+ status: published -+ tags: greeting - -``` - -## Three-way merge with no conflicts - -

    The classic case: each branch changes a different region. Pass the common ancestor plus both edits to MergeStrategy::merge() and read the merged result.

    - - -```php -merge( - "intro\nbody\noutro\n", - "intro updated\nbody\noutro\n", - "intro\nbody\noutro\nappendix\n" -); - -echo $result->has_conflicts() ? "conflicts!\n" : "clean merge:\n"; -echo $result->get_merged_content(); -``` - - -``` -clean merge: -intro updated -body -outro -appendix -``` - -## Inspect and surface conflicts - -

    When both sides edit the same region, the merger produces a MergeConflict. The merged content carries Git-style markers, but the structured get_conflicts() output is what you want for a UI that lets the user pick a side.

    - - -```php -merge( - "line 1\nline 2\n", - "line 1\nline 2 from Alice\n", - "line 1\nline 2 from Bob\n" -); - -if ( $result->has_conflicts() ) { - foreach ( $result->get_conflicts() as $c ) { - echo "ours: " . trim( $c->ours ) . "\n"; - echo "theirs: " . trim( $c->theirs ) . "\n"; - } -} -echo "\n--- merged content with markers ---\n"; -echo $result->get_merged_content(); -``` - - -``` -ours: line 2 from Alice -theirs: line 2 from Bob - ---- merged content with markers --- -line 1 - -<<<<<<< HEAD -line 2 from Alice - -======= -line 2 from Bob - ->>>>>>> incoming -``` - -## Sync a Markdown folder against an edited DB copy - -

    A real-world scenario: posts live both in a Git-tracked Markdown folder and in WordPress, and someone edits each. Three-way-merge each post against its common ancestor.

    - - -```php - array( - 'base' => "# Hello\nDraft body.\n", - 'disk' => "# Hello\nDraft body, expanded on disk.\n", - 'db' => "# Hello\nDraft body.\nNew section from the editor.\n", - ), - 'about.md' => array( - 'base' => "# About\nWho we are.\n", - 'disk' => "# About\nWho *they* are.\n", - 'db' => "# About\nWho we really are.\n", - ), -); - -foreach ( $posts as $name => $sides ) { - $result = $strategy->merge( $sides['base'], $sides['disk'], $sides['db'] ); - echo "=== {$name} ===\n"; - echo $result->has_conflicts() ? "(conflict — needs review)\n" : "(auto-merged)\n"; - echo $result->get_merged_content() . "\n"; -} -``` - - -``` -=== hello.md === -(conflict — needs review) -# Hello - -<<<<<<< HEAD -Draft body, expanded on disk. - -======= -New section from the editor. - ->>>>>>> incoming - - -=== about.md === -(conflict — needs review) -# About - -<<<<<<< HEAD -Who *they* are. - -======= -Who we really are. - ->>>>>>> incoming -``` diff --git a/bin/_docs_components/polyfill.md b/bin/_docs_components/polyfill.md deleted file mode 100644 index ede94bb91..000000000 --- a/bin/_docs_components/polyfill.md +++ /dev/null @@ -1,174 +0,0 @@ ---- -slug: polyfill -title: Polyfill -install: wp-php-toolkit/polyfill - -credit_title: WordPress-shaped behavior -credit_body: | - When WordPress is loaded, every function in this component defers to WordPress. The standalone implementations of esc_html(), add_filter(), __(), and friends match WordPress core's behavior so the same code runs inside and outside the platform. - -see_also: html | HTML | Run WordPress-shaped escaping and translation helpers beside HTML processors. -see_also: blockparser | BlockParser | Keep standalone block tooling familiar outside WordPress. ---- - -PHP 8 string functions on PHP 7.2+, WordPress hook stubs, and translation/escaping passthroughs so toolkit code runs without WordPress. - -## Why this exists - -

    A lot of WordPress-adjacent code wants to call esc_html(), __(), or apply_filters() without booting WordPress. The polyfill component provides minimal but real implementations so that code runs unchanged outside WordPress, and stays out of the way when WordPress is loaded (every function uses function_exists() guards).

    - -## PHP 8 string functions on PHP 7.2 - -

    The polyfills define str_contains, str_starts_with, str_ends_with, and array_key_first only when missing.

    - - -```php - 1, 'beta' => 2 ) ); -echo "first key: {$first_key}\n"; -``` - - -``` -bool(true) -bool(true) -bool(true) -first key: alpha -``` - -## Escaping and translation stubs - -

    Pass-through implementations let you write code that looks WordPressy and runs anywhere.

    - - -```php -alert("xss")' ) . "\n"; -echo esc_attr( 'a "quoted" value' ) . "\n"; -echo esc_url( 'https://example.com/?a=1&b=2' ) . "\n"; -``` - - -``` -Hello, world -<script>alert("xss")</script> -a "quoted" value -https://example.com/?a=1&b=2 -``` - -## A simple filter chain - -

    The hook system is a real implementation of the WordPress filter API: registered callbacks get applied in priority order, and each one transforms the running value.

    - - -```php - -``` -my-post-title -``` - -## Priority ordering and multi-arg passing - -

    Lower priority numbers run first. The fourth argument to add_filter controls how many context values get passed to the callback.

    - - -```php -{$html}"; -}, 10, 2 ); - -add_filter( 'render_price', function ( $html, $price, $currency ) { - if ( 'EUR' === $currency ) return $html . ' EUR'; - return $html . " {$currency}"; -}, 20, 3 ); - -echo apply_filters( 'render_price', '19.99', 19.99, 'EUR' ) . "\n"; -``` - - -``` -19.99 EUR (EUR markup) -``` - -## Hook-based extension points in standalone libraries - -

    Use do_action and apply_filters as cheap extension points in your own code, without depending on WordPress.

    - - -```php -process( array( 'email' => ' USER@EXAMPLE.COM ' ) ); -$pipeline->process( array( 'email' => 'OTHER@example.com' ) ); - -echo implode( "\n", $log ) . "\n"; -``` - - -``` -user@example.com -other@example.com -``` diff --git a/bin/_docs_components/xml.md b/bin/_docs_components/xml.md deleted file mode 100644 index a4cf4c620..000000000 --- a/bin/_docs_components/xml.md +++ /dev/null @@ -1,200 +0,0 @@ ---- -slug: xml -title: XML -install: wp-php-toolkit/xml - -see_also: dataliberation | DataLiberation | Read and write WXR-sized WordPress exports as entities. -see_also: encoding | Encoding | Validate and scrub text before strict XML processing. -see_also: bytestream | ByteStream | Keep large XML reads incremental. ---- - -A streaming, namespace-aware XML processor in pure PHP. Read and modify huge feeds, WXR exports, ePub manifests, and Office Open XML parts without ever loading the document into memory and without depending on libxml2. - -## Why this exists - -

    SimpleXMLElement and DOMDocument both need libxml2 and both build a complete in-memory tree. XMLProcessor walks the document forward as a cursor, keeps modifications in a side buffer, and emits the full updated XML with get_updated_xml() only when you ask for it.

    - -

    This design came from WordPress-scale documents such as WXR exports. A migration may only need to rewrite wp:attachment_url values or bump a feed attribute, so the processor optimizes for targeted cursor edits instead of a full validating XML stack.

    - -

    Footgun: Namespace-aware methods use the namespace URI, not the prefix written in the tag. In WXR, get_attribute( 'wp', 'status' ) looks for a namespace literally named wp; for the usual WXR declaration you want get_attribute( 'http://wordpress.org/export/1.2/', 'status' ).

    - -

    Footgun: In streaming mode next_tag() can return false because input ran out, not because the document ended. Check is_paused_at_incomplete_input() before assuming you're done.

    - -## Bump every price in a catalog - -

    Find each <book>, read its price, write a new one, emit the updated document.

    - - -```php - -PHP Internals -WordPress at Scale - -XML; - -$p = XMLProcessor::create_from_string( $xml ); -while ( $p->next_tag( 'book' ) ) { - $old = (float) $p->get_attribute( '', 'price' ); - $new = number_format( $old * 1.10, 2, '.', '' ); - $p->set_attribute( '', 'price', $new ); -} - -echo $p->get_updated_xml(); -``` - - -``` - -PHP Internals -WordPress at Scale - -``` - -## Read namespaced attributes from a WXR export - -

    WordPress's WXR commonly uses wp:, dc:, and content: prefixes bound to namespace names such as http://wordpress.org/export/1.2/. Pass that expanded namespace name, not the prefix; the processor handles whichever prefix the document actually uses.

    - - -```php - - - -Hello World -admin -42 -publish - -XML; - -$WP = 'http://wordpress.org/export/1.2/'; -$DC = 'http://purl.org/dc/elements/1.1/'; - -$p = XMLProcessor::create_from_string( $wxr ); -while ( $p->next_tag( 'item' ) ) { - while ( $p->next_token() ) { - if ( $p->is_tag_closer() && 'item' === $p->get_tag_local_name() ) break; - if ( ! $p->is_tag_opener() ) continue; - $ns = $p->get_tag_namespace(); - $local = $p->get_tag_local_name(); - $prefix = ( $WP === $ns ) ? 'wp/' : ( ( $DC === $ns ) ? 'dc/' : '' ); - echo "{$prefix}{$local}: "; - while ( $p->next_token() && '#text' !== $p->get_token_name() ) {} - echo trim( $p->get_modifiable_text() ) . "\n"; - } -} -``` - - -``` -title: Hello World -dc/creator: admin -wp/post_id: 42 -wp/status: publish -``` - -## Rewrite URLs across an entire WXR export - -

    Large WXR exports can hold many URLs in <link>, <guid>, and post content. Streaming the file lets you rewrite large exports without loading the whole XML document into memory.

    - - -```php - -https://old.example.com -https://old.example.com/2024/post-1 -https://old.example.com/?p=1 - -XML; - -$from = 'https://old.example.com'; -$to = 'https://new.example.com'; - -$p = XMLProcessor::create_from_string( $wxr ); -$rewritten = 0; - -while ( $p->next_token() ) { - if ( '#text' !== $p->get_token_name() ) continue; - $text = $p->get_modifiable_text(); - if ( false === strpos( $text, $from ) ) continue; - $p->set_modifiable_text( str_replace( $from, $to, $text ) ); - $rewritten++; -} - -echo "rewrote {$rewritten} text nodes\n\n"; -echo $p->get_updated_xml(); -``` - - -``` -rewrote 3 text nodes - - -https://new.example.com -https://new.example.com/2024/post-1 -https://new.example.com/?p=1 - -``` - -## Parse OPML to extract feed URLs - -

    OPML is the format Feedly and many readers use to import/export feed lists. Flat, attribute-heavy XML — exactly what a tag processor handles best.

    - - -```php -My Feeds - - - - - -XML; - -$p = XMLProcessor::create_from_string( $opml ); -while ( $p->next_tag( 'outline' ) ) { - $url = $p->get_attribute( '', 'xmlUrl' ); - if ( null === $url ) continue; - echo $p->get_attribute( '', 'text' ) . "\t" . $url . "\n"; -} -``` - - -``` -Hacker News https://news.ycombinator.com/rss -LWN https://lwn.net/headlines/rss -WordPress https://wordpress.org/news/feed/ -``` diff --git a/bin/_docs_components/zip.md b/bin/_docs_components/zip.md deleted file mode 100644 index 03e7e6a99..000000000 --- a/bin/_docs_components/zip.md +++ /dev/null @@ -1,398 +0,0 @@ ---- -slug: zip -title: Zip -install: wp-php-toolkit/zip - -see_also: ../learn/02-streaming-archives.html | Tutorial — Streaming archives | Walk through ZIP and EPUB writers from the toolkit's worked example. -see_also: filesystem | Filesystem | Treat an archive like a swappable filesystem backend. -see_also: bytestream | ByteStream | Feed readers and writers without whole-file buffers. -see_also: httpclient | HttpClient | Stream downloaded archives into validation or extraction workflows. ---- - -Read and write ZIP archives in pure PHP — no libzip, no ZipArchive. Streams entries one at a time, so you can build EPUBs, .docx files, and multi-gigabyte plugin bundles without buffering the archive in memory. - -## Why this exists - -

    Common PHP ZIP workflows rely on the ZipArchive extension or shelling out to zip. Those are awkward in hosts without libzip, WebAssembly builds, and code paths that need to stream archive data through toolkit byte streams.

    - -

    The Zip component reads and writes Stored and Deflate archives in pure PHP. The decoder is pull-based, so listing the central directory of a 2 GB ZIP costs roughly the size of the directory itself. The encoder accepts any ByteWriteStream as a sink and writes one entry at a time.

    - -## Read a file out of a ZIP - -

    ZipFilesystem implements this toolkit's Filesystem interface, so once you wrap the byte reader you can call get_contents(), ls(), and is_dir() just like the other filesystem backends.

    - -

    Try this: after Run, add a second append_file() call before $enc->close() for a notes.md entry, then call print_r( $zip->ls( '/' ) ) at the end. The directory listing reflects the new entry without re-reading the file.

    - - -```php -append_file( new FileEntry( array( - 'path' => 'readme.txt', - 'compression_method' => ZipDecoder::COMPRESSION_NONE, - 'body_reader' => new MemoryPipe( 'Hello from inside the zip.' ), -) ) ); -$enc->close(); -$out->close_writing(); - -$zip = ZipFilesystem::create( FileReadStream::from_path( $path ) ); -echo $zip->get_contents( 'readme.txt' ); -``` - - -``` -Hello from inside the zip. -``` - -## Build an EPUB from scratch - -

    An EPUB follows one strict ZIP rule: write the mimetype entry first and store it without compression. Deflate the rest of the archive normally.

    - -

    Gotcha: E-readers reject EPUBs whose mimetype entry has compression. Use COMPRESSION_NONE for that single entry.

    - - -```php -append_file( new FileEntry( array( - 'path' => 'mimetype', - 'compression_method' => ZipDecoder::COMPRESSION_NONE, - 'body_reader' => new MemoryPipe( 'application/epub+zip' ), -) ) ); - -$container = <<<'XML' - - - - -XML; - -foreach ( array( - 'META-INF/container.xml' => $container, - 'EPUB/package.opf' => <<<'XML' -', - 'EPUB/chapter1.xhtml' => <<<'XML' -

    Chapter 1

    It was a dark and stormy night.

    -XML, -) as $name => $body ) { - $enc->append_file( new FileEntry( array( - 'path' => $name, - 'compression_method' => ZipDecoder::COMPRESSION_DEFLATE, - 'body_reader' => new MemoryPipe( $body ), - ) ) ); -} -$enc->close(); -$out->close_writing(); - -$zip = ZipFilesystem::create( FileReadStream::from_path( $path ) ); -printf( "mimetype: %s\n", $zip->get_contents( 'mimetype' ) ); -printf( "size on disk: %d bytes\n", filesize( $path ) ); -``` - - -``` -mimetype: application/epub+zip -size on disk: 726 bytes -``` - -## Stream a large entry without buffering it - -

    Calling get_contents() on a 500 MB CSV inside a ZIP would eat 500 MB of RAM. Use open_read_stream() instead and inflate-as-you-go.

    - -

    Gotcha: Only one entry stream open at a time. Drain or finish the previous stream before opening the next.

    - - -```php -append_file( new FileEntry( array( - 'path' => 'data.csv', - 'compression_method' => ZipDecoder::COMPRESSION_DEFLATE, - 'body_reader' => new MemoryPipe( str_repeat( "id,value,timestamp\n1,foo,2024\n2,bar,2024\n", 5000 ) ), -) ) ); -$enc->close(); -$out->close_writing(); - -$zip = ZipFilesystem::create( FileReadStream::from_path( $path ) ); -$stream = $zip->open_read_stream( 'data.csv' ); - -$rows = 0; -$bytes = 0; -$tail = ''; -while ( ! $stream->reached_end_of_data() ) { - $n = $stream->pull( 8192 ); - if ( 0 === $n ) break; - $chunk = $tail . $stream->consume( $n ); - $lines = explode( "\n", $chunk ); - $tail = array_pop( $lines ); - $rows += count( $lines ); - $bytes += $n; -} -printf( "Inflated %d bytes in 8 KB chunks, parsed %d rows.\n", $bytes, $rows ); -``` - - -``` -Inflated 205000 bytes in 8 KB chunks, parsed 15000 rows. -``` - -## Repack: modify one file, copy the rest - -

    Updating one file in a ZIP without rewriting the others is impossible at the format level — the central directory points at byte offsets. The pragmatic answer is repack: stream the source archive into a new one, swapping the file you care about.

    - - -```php - '{"debug":false,"version":"1.0"}', - 'app/index.php' => <<<'HTML' - 'body{color:#333} -HTML, -) as $name => $body ) { - $src_enc->append_file( new FileEntry( array( - 'path' => $name, - 'compression_method' => ZipDecoder::COMPRESSION_DEFLATE, - 'body_reader' => new MemoryPipe( $body ), - ) ) ); -} -$src_enc->close(); -$src_out->close_writing(); - -$source = ZipFilesystem::create( FileReadStream::from_path( $src_path ) ); -$dst_path = tempnam( sys_get_temp_dir(), 'repacked' ) . '.zip'; -$dst_out = FileWriteStream::from_path( $dst_path, 'truncate' ); -$dst_enc = new ZipEncoder( $dst_out ); - -$dirs = array( '/' ); -while ( $dirs ) { - $dir = array_shift( $dirs ); - foreach ( $source->ls( $dir ) as $name ) { - $path = rtrim( $dir, '/' ) . '/' . $name; - if ( $source->is_dir( $path ) ) { - $dirs[] = $path; - continue; - } - $rel = ltrim( $path, '/' ); - $body = ( 'config.json' === $rel ) - ? '{"debug":true,"version":"1.0.1"}' - : $source->get_contents( $rel ); - $dst_enc->append_file( new FileEntry( array( - 'path' => $rel, - 'compression_method' => ZipDecoder::COMPRESSION_DEFLATE, - 'body_reader' => new MemoryPipe( $body ), - ) ) ); - } -} -$dst_enc->close(); -$dst_out->close_writing(); - -$repacked = ZipFilesystem::create( FileReadStream::from_path( $dst_path ) ); -echo "new config.json: " . $repacked->get_contents( 'config.json' ) . "\n"; -echo "untouched: " . $repacked->get_contents( 'app/index.php' ) . "\n"; -``` - - -``` -new config.json: {"debug":true,"version":"1.0.1"} -untouched: 'body{color:#333} -``` - -## Defend against zip-slip - -

    A malicious archive can name an entry ../../etc/passwd and trick a naive extractor into clobbering files outside the destination. ZipDecoder::sanitize_path() strips leading ../ segments and collapses internal /../ sequences before exposing the path.

    - - -```php - %s\n", $name, ZipDecoder::sanitize_path( $name ) ); -} -``` - - -``` -../../etc/passwd => etc/passwd -./safe/path.txt => ./safe/path.txt -a/../../b/secret => a/../b/secret -a//b///c.txt => a/b/c.txt -../../../../root/.ssh/authorized_keys => root/.ssh/authorized_keys -``` - -## Pipe ZIP entries into an InMemoryFilesystem - -

    Real-world recipe: take an uploaded plugin ZIP, expand it into an InMemoryFilesystem so you can validate, edit, or scan it before it ever touches disk. Three components compose into something you couldn't build with ZipArchive alone.

    - - -```php - <<<'HTML' - ' 'body{margin:0}', - 'app/README.md' => '# App', -) as $name => $body ) { - $enc->append_file( new FileEntry( array( - 'path' => $name, - 'compression_method' => ZipDecoder::COMPRESSION_DEFLATE, - 'body_reader' => new MemoryPipe( $body ), - ) ) ); -} -$enc->close(); -$out->close_writing(); - -$zip = ZipFilesystem::create( FileReadStream::from_path( $path ) ); -$mem = InMemoryFilesystem::create(); -copy_between_filesystems( array( - 'source_filesystem' => $zip, - 'source_path' => '/', - 'target_filesystem' => $mem, - 'target_path' => '/', -) ); - -$mem->put_contents( '/app/VERSION', '1.0.0' ); -echo "files now in memory:\n"; -$dirs = array( '/' ); -$files = array(); -while ( $dirs ) { - $dir = array_shift( $dirs ); - foreach ( $mem->ls( $dir ) as $name ) { - $p = rtrim( $dir, '/' ) . '/' . $name; - if ( $mem->is_dir( $p ) ) { - $dirs[] = $p; - continue; - } - $files[] = $p; - } -} -sort( $files ); -foreach ( $files as $path ) { - echo " " . $path . "\n"; -} -``` - - -``` -files now in memory: - /app/README.md - /app/VERSION - /app/assets/style.css - /app/index.php -``` - -## When to use which type - - - - - - - - -
    UseFor
    ZipFilesystem::create()Reading. You want get_contents(), ls(), is_dir() over a ZIP. The most common case.
    ZipEncoderWriting. Stream entries into any ByteWriteStream sink. Required when format rules matter (EPUB, .docx).
    ZipDecoderLow-level read access to the central directory and individual entry headers. Most code reaches for ZipFilesystem instead.
    open_read_stream() on a ZipFilesystemInflating a single large entry without buffering it whole in memory.
    copy_between_filesystems()Moving entries from a ZIP into another filesystem (memory, local, SQLite).
    - -

    Footgun: Updating an entry in place is impossible. The central directory points at byte offsets — change one entry's compressed size and every later offset shifts. Repack into a new archive instead.

    - -

    Footgun: Never extract entry paths verbatim. Always run paths through ZipDecoder::sanitize_path(). Without it, a hostile archive can write outside the destination directory.

    - -

    Footgun: Encrypted archives aren't supported. If you need to read AES-encrypted ZIPs, this isn't the component. The file format technically allows encryption, but the toolkit deliberately excludes it because the implementation surface is large and the use case is rare in WordPress contexts.

    diff --git a/bin/_extract_catalog.py b/bin/_extract_catalog.py deleted file mode 100644 index d1c9d958a..000000000 --- a/bin/_extract_catalog.py +++ /dev/null @@ -1,142 +0,0 @@ -#!/usr/bin/env python3 -"""One-shot tool: dumps the current bin/_docs_components.py + sibling -data (CREDITS, COMPONENT_RELATIONS, _expected_outputs.json) into per- -component markdown files under bin/_docs_components/.md. - -Run when migrating to or refreshing the markdown source. Each file ends -up self-describing — frontmatter for metadata, body for prose, fenced -blocks for snippets and their captured expected outputs. -""" - -import json -import os -import re -import sys - -THIS = os.path.dirname(os.path.abspath(__file__)) -sys.path.insert(0, THIS) - -OUT_DIR = os.path.join(THIS, '_docs_components') -EXPECTED_PATH = os.path.join(THIS, '_expected_outputs.json') - - -def load_sources(): - """Pull legacy data straight from the original Python module so this - script can be re-run after content updates. Falls back to importing - only what's still defined.""" - from _docs_components import COMPONENTS # noqa: E402 - try: - from _docs_components import CREDITS - except ImportError: - CREDITS = {} - try: - from _docs_components import COMPONENT_RELATIONS - except ImportError: - COMPONENT_RELATIONS = {} - expected = {} - if os.path.exists(EXPECTED_PATH): - with open(EXPECTED_PATH) as f: - for k, v in json.load(f).items(): - slug, _, fname = k.partition('::') - expected[(slug, fname)] = v - return COMPONENTS, CREDITS, COMPONENT_RELATIONS, expected - - -def split_html_blocks(html): - """Break a flat HTML body into top-level blocks for prettier markdown.""" - pattern = re.compile( - r'(<(?:p|ul|ol|pre|blockquote|table|h[1-6]|div|aside)\b[^>]*>.*?)', - re.DOTALL | re.IGNORECASE, - ) - parts = pattern.split(html) - return [p.strip() for p in parts if p.strip()] - - -def write_component(slug, title, lede, install, sections, credit, see_also, expected): - lines = [ - '---', - f'slug: {slug}', - f'title: {title}', - ] - if install: - lines.append(f'install: {install}') - if credit: - credit_title, credit_body = credit - lines.append('') - lines.append(f'credit_title: {credit_title}') - # Multi-line block: indent every line by 2 spaces. - lines.append('credit_body: |') - for chunk in split_html_blocks(credit_body) or [credit_body.strip()]: - for sub in chunk.splitlines(): - lines.append(f' {sub}') - if see_also: - lines.append('') - for rel_slug, rel_title, reason in see_also: - lines.append(f'see_also: {rel_slug} | {rel_title} | {reason}') - lines.append('---') - lines.append('') - lines.append(lede.rstrip()) - lines.append('') - - for heading, body, snippet in sections: - lines.append(f'## {heading}') - lines.append('') - if body: - for chunk in split_html_blocks(body): - lines.append(chunk) - lines.append('') - - if snippet: - filename = snippet[0] - code = snippet[1] - runnable = len(snippet) < 3 or snippet[2] - fence = '```' - while fence in code: - fence += '`' - lines.append('') - lines.append(f'{fence}php') - lines.append(code.rstrip('\n')) - lines.append(fence) - lines.append('') - - exp = expected.get((slug, filename)) - if exp is not None: - # Pick a fence longer than any backtick run inside the output. - exp_fence = '```' - while exp_fence in exp: - exp_fence += '`' - lines.append('') - lines.append(exp_fence) - lines.append(exp.rstrip('\n')) - lines.append(exp_fence) - lines.append('') - - out = '\n'.join(lines).rstrip() + '\n' - path = os.path.join(OUT_DIR, f'{slug}.md') - with open(path, 'w', encoding='utf-8') as f: - f.write(out) - return path - - -def main(): - os.makedirs(OUT_DIR, exist_ok=True) - components, credits, relations, expected = load_sources() - written = [] - for slug, title, lede, install, sections in components: - credit = credits.get(slug) - see_also = relations.get(slug, ()) - path = write_component( - slug, title, lede, install, sections, - credit, see_also, expected, - ) - written.append((slug, path)) - print(f'Extracted {len(written)} components to {OUT_DIR}/') - for slug, path in written: - print(f' {slug:<20} {os.path.relpath(path)}') - - -if __name__ == '__main__': - main() diff --git a/bin/_load_catalog.py b/bin/_load_catalog.py index 22c4380b0..a63a2f7f0 100644 --- a/bin/_load_catalog.py +++ b/bin/_load_catalog.py @@ -1,5 +1,13 @@ -"""Loads bin/_docs_components/.md into the COMPONENTS data structure -that the build scripts and the snippet runner consume. +"""Loads each `components//README.md` into the COMPONENTS data +structure the build scripts and the snippet runner consume. + +The README *is* the catalog source: it doubles as the GitHub/Packagist +README and as the docs-site catalog. YAML-style frontmatter at the top +carries the slug/title/install/credit/see-also metadata; the body is +plain markdown with fenced PHP snippets and `` +fenced blocks. GitHub's renderer hides frontmatter from the README view +on github.com, so the metadata is invisible to readers but available to +the build pipeline. Markdown file format (one per component): @@ -53,7 +61,34 @@ import re THIS = os.path.dirname(os.path.abspath(__file__)) -COMPONENT_DIR = os.path.join(THIS, '_docs_components') +ROOT = os.path.dirname(THIS) +COMPONENTS_ROOT = os.path.join(ROOT, 'components') + +# Slug → component-directory mapping. Each component's README.md *is* the +# catalog source: it carries the YAML-style frontmatter, lede, sections, +# snippets, and expected-output blocks the docs site needs. The ordered +# tuple here also defines the order components appear on the landing page +# and in the reference sidebar. +COMPONENT_ORDER = ( + ('html', 'HTML'), + ('zip', 'Zip'), + ('bytestream', 'ByteStream'), + ('filesystem', 'Filesystem'), + ('blockparser', 'BlockParser'), + ('markdown', 'Markdown'), + ('xml', 'XML'), + ('encoding', 'Encoding'), + ('dataliberation', 'DataLiberation'), + ('git', 'Git'), + ('merge', 'Merge'), + ('httpclient', 'HttpClient'), + ('httpserver', 'HttpServer'), + ('corsproxy', 'CORSProxy'), + ('cli', 'CLI'), + ('polyfill', 'Polyfill'), + ('blueprints', 'Blueprints'), + ('coding-standards', 'ToolkitCodingStandards'), +) _FRONTMATTER_RE = re.compile(r'\A---\n(.*?)\n---\n?', re.DOTALL) _SNIPPET_RE = re.compile( @@ -290,28 +325,8 @@ def load_components_rich(): ] """ components = [] - manifest = os.path.join(COMPONENT_DIR, '_order.txt') - if os.path.exists(manifest): - with open(manifest) as f: - ordered_slugs = [ - line.strip() - for line in f - if line.strip() and not line.startswith('#') - ] - else: - import sys as _sys - print( - f'WARNING: {manifest} missing; loading components in filesystem order', - file=_sys.stderr, - ) - ordered_slugs = sorted( - os.path.splitext(name)[0] - for name in os.listdir(COMPONENT_DIR) - if name.endswith('.md') and not name.startswith('_') - ) - - for slug in ordered_slugs: - path = os.path.join(COMPONENT_DIR, f'{slug}.md') + for slug, dir_name in COMPONENT_ORDER: + path = os.path.join(COMPONENTS_ROOT, dir_name, 'README.md') with open(path, encoding='utf-8') as f: text = f.read() fields, body = _parse_frontmatter(text) diff --git a/bin/build-docs-bundle.sh b/bin/build-docs-bundle.sh index 3da4bca22..1229fa147 100755 --- a/bin/build-docs-bundle.sh +++ b/bin/build-docs-bundle.sh @@ -1,7 +1,7 @@ #!/usr/bin/env bash # Rebuilds docs/assets/php-toolkit.zip and regenerates the docs HTML pages -# from the markdown sources in bin/_docs_components/. Run this whenever -# components/ changes or a per-component .md file is edited. +# from each components//README.md. Run this whenever a component's +# README.md or its source changes. set -euo pipefail cd "$(dirname "$0")/.." diff --git a/bin/build-reference.py b/bin/build-reference.py index f768f95ac..b8401993c 100644 --- a/bin/build-reference.py +++ b/bin/build-reference.py @@ -1,10 +1,12 @@ #!/usr/bin/env python3 """Generates docs/reference/.html for every component. -The catalog comes from bin/_docs_components/.md (loaded via -bin/_docs_components.py). Every page uses the same concept-guide shape: -lede + install + context paragraphs + minimal example + refinements + -pitfalls + see also. There are no hand-authored exceptions. +The catalog comes from components//README.md (loaded via +bin/_load_catalog.py). Each README *is* the catalog source — frontmatter ++ lede + sections + snippets + expected-output fences. Every page uses +the same concept-guide shape: lede + install + context paragraphs + +minimal example + refinements + pitfalls + see also. There are no +hand-authored exceptions. """ import os diff --git a/bin/run-snippets.py b/bin/run-snippets.py index ebe7c3bf6..0ff6af0f0 100755 --- a/bin/run-snippets.py +++ b/bin/run-snippets.py @@ -1,5 +1,5 @@ #!/usr/bin/env python3 -"""Runs every PHP snippet declared in bin/_docs_components/.md against +"""Runs every PHP snippet declared in components//README.md against the local toolkit and compares stdout to the captured expected-output that lives next to the snippet in markdown. Used in two ways: @@ -33,7 +33,7 @@ from _load_catalog import load_components_rich # noqa: E402 VENDOR_AUTOLOAD = os.path.join(ROOT, 'vendor', 'autoload.php') -COMPONENT_DIR = os.path.join(THIS, '_docs_components') +COMPONENTS_ROOT = os.path.join(ROOT, 'components') # Runnable snippets whose stdout is unstable. They exit 0 but their output # is not pinned (real network traffic, timestamps, host-specific values). @@ -105,9 +105,13 @@ def normalize(text): def write_expected_output(slug, filename, new_output): - """Write a new captured stdout into the slug's markdown file, creating + """Write a new captured stdout into the component's README.md, creating or updating the snippet's `` fence.""" - path = os.path.join(COMPONENT_DIR, f'{slug}.md') + from _load_catalog import COMPONENT_ORDER + dir_name = dict(COMPONENT_ORDER).get(slug) + if not dir_name: + raise ValueError(f'Unknown component slug: {slug}') + path = os.path.join(COMPONENTS_ROOT, dir_name, 'README.md') with open(path, encoding='utf-8') as f: text = f.read() diff --git a/components/BlockParser/README.md b/components/BlockParser/README.md index 54f0b7274..95cb8c22b 100644 --- a/components/BlockParser/README.md +++ b/components/BlockParser/README.md @@ -1,142 +1,301 @@ -# BlockParser +--- +slug: blockparser +title: BlockParser +install: wp-php-toolkit/blockparser - -> 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/blockparser.html](https://wordpress.github.io/php-toolkit/reference/blockparser.html) -> Open the page to edit each snippet in your browser and run it in WordPress Playground. - +credit_title: WordPress core, packaged standalone +credit_body: | + WP_Block_Parser is WordPress core's block parser, packaged here so importers and linters can read block markup without booting WordPress. Source: WordPress/wordpress-develop. -## Why this exists +see_also: html | HTML | Inspect or rewrite the HTML carried by parsed blocks. +see_also: markdown | Markdown | Move between author-friendly Markdown and serialized block markup. +see_also: dataliberation | DataLiberation | Audit and transform blocks while migrating content. +--- -WordPress stores post content as annotated HTML. Instead of inventing a separate file format, it embeds block boundaries directly inside HTML comments: +WordPress core's block parser, packaged as a standalone library. Turn block markup into a structured tree, lint posts for common authoring mistakes, and audit block usage — all without booting WordPress. -```html - -

    Hello, world.

    - +## Why this exists - -
    - -``` +

    Block markup is not plain HTML. A post can contain HTML comments that identify blocks, JSON attributes inside those comments, freeform HTML between blocks, and nested blocks whose rendered HTML is interleaved with parent markup.

    -Every WordPress editor, REST API response, and block renderer needs to turn that serialized markup into a structured tree. WordPress core ships `WP_Block_Parser` to do exactly that — but it's buried inside WordPress itself, tied to the full WordPress load. This component extracts it so you can parse block markup anywhere: CLI tools, build scripts, data-migration pipelines, standalone PHP apps — without booting WordPress. +

    This component packages WordPress core's block parser so importers, linters, migration tools, and static analyzers can understand block content without loading WordPress. It deliberately mirrors core behavior — same array shape, same null blocks for freeform HTML, same core block names such as core/paragraph — so code written against this parser keeps working when run inside WordPress, and vice versa.

    -## How it works +

    Reach for it when you need answers about the block tree: which blocks a post uses, which attributes they carry, where nested blocks appear, or whether content violates a rule your project cares about.

    -The parser is a single-pass, stack-based scanner. It moves forward through the document looking for HTML comments that follow the block annotation pattern. When it finds an opening comment like ``, it: +## What you get back -1. Decodes the JSON attributes from the comment body. -2. Pushes a frame onto a stack, recording the block name, attributes, and the byte offset where the block started. -3. Keeps scanning, collecting the raw HTML between the opening and closing comments as `innerHTML`. -4. If it encounters another `` before the closing comment, it recurses — pushing a new frame for the inner block. -5. When it finds a closing comment (``), it pops the frame, attaches any collected inner blocks, and appends the completed block to its parent. +

    WP_Block_Parser::parse() returns an array of blocks. Each block is an associative array with five keys: blockName, attrs, innerBlocks, innerHTML, and innerContent.

    -Freeform content between blocks — plain HTML with no block annotations — becomes a "classic block" with `blockName` set to `null`. +

    innerHTML is the HTML inside the block with inner blocks stripped out. innerContent is the interleaved version: an array of HTML strings with null placeholders marking where each inner block belongs.

    -The `innerContent` array is the most subtle part of the output. It interleaves child block positions with raw HTML chunks, letting renderers reconstruct the exact original layout. This is how the columns block describes which raw HTML wraps each inner column. +

    Most code starts by checking blockName, then reading attrs or innerHTML. When a post has container blocks such as Group, Columns, or Navigation, look inside innerBlocks too.

    -## Usage +

    Footgun: Freeform HTML between blocks shows up as a block with blockName === null. Always skip that case before comparing names.

    -### Parse a post's block content +## Parse a document +

    The simplest possible use. Pass a string, get back a tree.

    + + ```php -use WordPress\BlockParser\WP_Block_Parser; +parse( $post_content ); +$document = "\n

    Welcome

    \n\n\n" + . "\n

    Hello from the block editor.

    \n"; +$blocks = ( new WP_Block_Parser() )->parse( $document ); foreach ( $blocks as $block ) { - echo $block['blockName']; // e.g. "core/paragraph" - echo $block['innerHTML']; // the raw HTML inside the block - // $block['attrs'] — decoded JSON attributes - // $block['innerBlocks'] — nested blocks (same structure, recursive) - // $block['innerContent'] — interleaved HTML chunks + child-block slots + if ( null === $block['blockName'] ) { + continue; + } + echo $block['blockName'] . ': ' . trim( strip_tags( $block['innerHTML'] ) ) . "\n"; } ``` -### Inspect block attributes + +``` +core/heading: Welcome +core/paragraph: Hello from the block editor. +``` + +## Count every block type in a post -Attributes are encoded as JSON in the opening comment and decoded automatically: +

    A common audit task: "How many Paragraph, Image, and Gallery blocks does this post use?" A small queue keeps the example readable while still visiting nested blocks.

    + ```php -$markup = '' - . '
    ...
    ' - . ''; +
    " + . "

    Title

    " + . "

    One.

    " + . "

    Two.

    " + . "
    " + . "
    "; + +$blocks = ( new WP_Block_Parser() )->parse( $document ); + +$counts = array(); +$queue = $blocks; + +while ( ! empty( $queue ) ) { + $block = array_shift( $queue ); + + if ( null !== $block['blockName'] ) { + $name = $block['blockName']; + $counts[ $name ] = isset( $counts[ $name ] ) ? $counts[ $name ] + 1 : 1; + } -$blocks = $parser->parse( $markup ); -echo $blocks[0]['attrs']['sizeSlug']; // "large" + foreach ( $block['innerBlocks'] as $inner_block ) { + $queue[] = $inner_block; + } +} + +arsort( $counts ); +foreach ( $counts as $name => $n ) { + echo str_pad( (string) $n, 4, ' ', STR_PAD_LEFT ) . ' ' . $name . "\n"; +} +``` + + +``` + 2 core/paragraph + 1 core/group + 1 core/heading + 1 core/image ``` -### Walk a nested block tree +## Check whether a post uses a block -Blocks can contain other blocks. The `innerBlocks` key holds them recursively: +

    Useful for templates, audits, and migrations: answer one yes/no question without caring where the block appears in the tree.

    + ```php -function walk( array $blocks, int $depth = 0 ): void { - foreach ( $blocks as $block ) { - if ( $block['blockName'] === null ) { - continue; // skip freeform HTML between blocks - } - echo str_repeat( ' ', $depth ) . $block['blockName'] . "\n"; - walk( $block['innerBlocks'], $depth + 1 ); - } +
    " + . "
    " + . "" + . "
    " + . "
    "; + +$blocks = ( new WP_Block_Parser() )->parse( $document ); + +function post_has_block( $blocks, $name ) { + $queue = $blocks; + + while ( ! empty( $queue ) ) { + $block = array_shift( $queue ); + if ( $name === $block['blockName'] ) { + return true; + } + + foreach ( $block['innerBlocks'] as $inner_block ) { + $queue[] = $inner_block; + } + } + + return false; } -walk( $parser->parse( $post_content ) ); -// core/columns -// core/column -// core/paragraph -// core/column -// core/image +echo post_has_block( $blocks, 'core/button' ) ? "has button\n" : "missing button\n"; +echo post_has_block( $blocks, 'core/gallery' ) ? "has gallery\n" : "missing gallery\n"; +``` + + +``` +has button +missing gallery ``` -### Reconstruct output using innerContent +## Lint headings for hierarchy mistakes -The `innerContent` array lets you rebuild the original markup while swapping in rendered child blocks: +

    "Don't skip from H2 to H4" is a real accessibility rule. The helper below keeps headings in document order, including headings nested inside Group, Column, and Cover blocks.

    + ```php -function render_block( array $block ): string { - $output = ''; - $child_index = 0; - - foreach ( $block['innerContent'] as $chunk ) { - if ( is_string( $chunk ) ) { - $output .= $chunk; - } else { - // null = "insert rendered child block here" - $output .= render_block( $block['innerBlocks'][ $child_index++ ] ); - } - } - - return $output; +\n

    Intro

    \n" + . "\n

    Subsection

    \n" + . "\n

    Body

    \n"; + +$blocks = ( new WP_Block_Parser() )->parse( $document ); + +function collect_headings( $blocks, &$headings ) { + foreach ( $blocks as $block ) { + if ( 'core/heading' === $block['blockName'] ) { + $headings[] = array( + 'level' => isset( $block['attrs']['level'] ) ? (int) $block['attrs']['level'] : 2, + 'text' => trim( strip_tags( $block['innerHTML'] ) ), + ); + } + + collect_headings( $block['innerBlocks'], $headings ); + } +} + +$headings = array(); +collect_headings( $blocks, $headings ); + +$last = 1; +foreach ( $headings as $heading ) { + $level = $heading['level']; + $label = $heading['text']; + + if ( $level > $last + 1 ) { + echo "WARN {$label}: jumped from H{$last} to H{$level}\n"; + } else { + echo "ok {$label}: H{$level}\n"; + } + $last = $level; } ``` -### Find all blocks of a specific type + +``` +ok Intro: H2 +WARN Subsection: jumped from H2 to H4 +ok Body: H3 +``` + +## Find all instances of a custom block + +

    When auditing an export for a block your plugin owns, collect every match and print the fields a human cares about.

    + ```php -function find_blocks( array $blocks, string $name ): array { - $found = array(); - foreach ( $blocks as $block ) { - if ( $block['blockName'] === $name ) { - $found[] = $block; - } - $found = array_merge( $found, find_blocks( $block['innerBlocks'], $name ) ); - } - return $found; +

    Reviews

    " + . "" + . "
    Loved it.
    " + . "" + . "" + . "
    Pretty good.
    " + . ""; + +$blocks = ( new WP_Block_Parser() )->parse( $document ); + +function find_blocks_by_name( $blocks, $name, &$matches ) { + foreach ( $blocks as $block ) { + if ( $name === $block['blockName'] ) { + $matches[] = $block; + } + + find_blocks_by_name( $block['innerBlocks'], $name, $matches ); + } } -$images = find_blocks( $parser->parse( $post_content ), 'core/image' ); +$testimonials = array(); +find_blocks_by_name( $blocks, 'my-plugin/testimonial', $testimonials ); + +foreach ( $testimonials as $i => $b ) { + echo ( $i + 1 ) . '. ' . $b['attrs']['author'] . ' (' . $b['attrs']['rating'] . '/5): ' + . trim( strip_tags( $b['innerHTML'] ) ) . "\n"; +} +``` + + +``` +1. Jane (5/5): Loved it. +2. Joe (4/5): Pretty good. ``` -## Block structure reference +## Detect blocks with stale embed URLs -Each parsed block is an associative array: +

    A real-world content audit: find every core/embed whose URL points at a domain you have retired.

    -| Key | Type | Description | -|-----|------|-------------| -| `blockName` | `string\|null` | Namespaced block name, e.g. `"core/paragraph"`. `null` for classic/freeform content between blocks. | -| `attrs` | `array` | Decoded JSON attributes from the opening comment. Empty array if none. | -| `innerBlocks` | `array` | Recursively parsed child blocks in order of appearance. | -| `innerHTML` | `string` | The full raw HTML between the opening and closing comments, including inner block markup verbatim. | -| `innerContent` | `array` | Interleaved array: strings are raw HTML chunks, `null` values mark positions where a child block from `innerBlocks` should be inserted. | + +```php + + + +HTML; + +$retired = array( 'vine.co', 'plus.google.com' ); + +foreach ( ( new WP_Block_Parser() )->parse( $document ) as $b ) { + if ( 'core/embed' !== $b['blockName'] ) { + continue; + } + $url = isset( $b['attrs']['url'] ) ? $b['attrs']['url'] : ''; + $host = parse_url( $url, PHP_URL_HOST ); + $bad = $host && in_array( $host, $retired, true ); + echo ( $bad ? 'STALE ' : 'ok ' ) . $url . "\n"; +} +``` + + +``` +ok https://twitter.com/wordpress/status/1 +ok https://youtube.com/watch?v=abc +STALE https://vine.co/v/xyz +``` diff --git a/components/Blueprints/README.md b/components/Blueprints/README.md index 1ffd71e26..95bbe7ca5 100644 --- a/components/Blueprints/README.md +++ b/components/Blueprints/README.md @@ -1,301 +1,200 @@ -# Blueprints +--- +slug: blueprints +title: Blueprints +install: wp-php-toolkit/blueprints - -> 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/blueprints.html](https://wordpress.github.io/php-toolkit/reference/blueprints.html) -> Open the page to edit each snippet in your browser and run it in WordPress Playground. - +see_also: filesystem | Filesystem | Prepare files and fixtures before applying site setup steps. +see_also: httpclient | HttpClient | Download packages or source data as part of provisioning workflows. +see_also: cli | CLI | Wrap repeatable blueprint operations in a small command. +--- -Declarative WordPress site provisioning. Define a site's desired state as a JSON blueprint -- which plugins to install, which options to set, which content to import -- and let the runner execute it. Blueprints can create a new WordPress site from scratch or modify an existing one, making them useful for development environments, demo sites, automated testing, and reproducible WordPress setups. +Declarative WordPress site provisioning. Write a JSON description of plugins, options, and content; let the runner execute it. -## Installation +## Why this exists -``` -composer require wp-php-toolkit/blueprints -``` +

    A WordPress environment is more than a database dump. It can require a specific core version, plugins, themes, site options, uploaded files, content, and setup steps. Rebuilding that by hand makes demos, tests, bug reports, workshops, and CI fixtures drift over time.

    + +

    The Blueprints component treats site setup as data. A blueprint JSON document describes the desired steps, and the runner applies them to either a new WordPress install or an existing one. The validator exists because user-authored JSON needs clear, path-specific errors rather than generic schema failures.

    -## Quick Start +

    RunnerConfiguration separates the web root from the WordPress core directory, since real hosts often put them in different places. Both paths are explicit on the runner, never inferred.

    -Create a new WordPress site from a blueprint JSON file: +

    Blueprints can create a new WordPress install (download core, set up the database, apply steps) or apply to an existing site. Creating a fresh install needs filesystem access this in-browser runtime doesn't have, so the runnable snippets focus on APPLY_TO_EXISTING_SITE.

    +## Configure a runner for an existing site + +

    RunnerConfiguration is a fluent builder. The minimum: target site root, target site URL, execution mode.

    + + ```php +set_execution_mode( Runner::EXECUTION_MODE_CREATE_NEW_SITE ) - ->set_blueprint( new AbsoluteLocalPath( '/path/to/blueprint.json' ) ) - ->set_target_site_root( '/var/www/my-site' ) - ->set_target_site_url( 'http://localhost:8080' ) - ->set_database_engine( 'sqlite' ); - -$runner = new Runner( $config ); -$runner->run(); -``` - -Where `blueprint.json` looks like: + ->set_execution_mode( Runner::EXECUTION_MODE_APPLY_TO_EXISTING_SITE ) + ->set_target_site_root( '/wordpress' ) + ->set_target_site_url( 'http://playground.test/' ); -```json -{ - "version": 2, - "steps": [ - { - "step": "installPlugin", - "pluginData": "https://downloads.wordpress.org/plugin/gutenberg.zip" - }, - { - "step": "setSiteOptions", - "options": { - "blogname": "My Test Site", - "blogdescription": "Built with Blueprints" - } - } - ] -} +echo "mode: " . $config->get_execution_mode() . "\n"; +echo "root: " . $config->get_target_site_root() . "\n"; +echo "url: " . $config->get_target_site_url() . "\n"; ``` -## Usage - -### Execution modes + +``` +mode: apply-to-existing-site +root: /wordpress +url: http://playground.test/ +``` -Blueprints supports two execution modes: +## Generate blueprint JSON from PHP -- **`EXECUTION_MODE_CREATE_NEW_SITE`** -- Downloads WordPress, creates the database, and applies the blueprint steps. Use this for spinning up fresh sites. -- **`EXECUTION_MODE_APPLY_TO_EXISTING_SITE`** -- Applies the blueprint steps to an already-installed WordPress site. Use this for modifying live or staging sites. +

    CI jobs and tests stay clearer when PHP builds the blueprint from data instead of hand-writing JSON. Keep the structure plain: version, then a list of step arrays.

    + ```php -use WordPress\Blueprints\Runner; -use WordPress\Blueprints\RunnerConfiguration; -use WordPress\Blueprints\DataReference\AbsoluteLocalPath; - -// Apply a blueprint to an existing site -$config = ( new RunnerConfiguration() ) - ->set_execution_mode( Runner::EXECUTION_MODE_APPLY_TO_EXISTING_SITE ) - ->set_blueprint( new AbsoluteLocalPath( '/path/to/blueprint.json' ) ) - ->set_target_site_root( '/var/www/existing-site' ) - ->set_target_site_url( 'http://localhost:8080' ) - ->set_database_engine( 'mysql' ) - ->set_database_credentials( array( - 'host' => '127.0.0.1', - 'port' => 3306, - 'user' => 'wp', - 'password' => 'secret', - 'dbname' => 'wordpress', - ) ); - -$runner = new Runner( $config ); -$runner->run(); -``` + 2, + 'steps' => array( + array( + 'step' => 'setSiteOptions', + 'options' => array( + 'blogname' => $site_name, + 'permalink_structure' => '/%postname%/', + 'show_on_front' => 'page', + ), + ), + ), +); -### Blueprint JSON structure +foreach ( $plugins as $slug ) { + $blueprint['steps'][] = array( + 'step' => 'installPlugin', + 'pluginData' => "https://downloads.wordpress.org/plugin/{$slug}.zip", + ); + $blueprint['steps'][] = array( + 'step' => 'activatePlugin', + 'plugin' => "{$slug}/{$slug}.php", + ); +} -A blueprint is a JSON document with a `version` field and a `steps` array. Each step declares a single operation: +echo json_encode( $blueprint, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES ) . "\n"; +``` -```json + +``` { "version": 2, "steps": [ { - "step": "mkdir", - "path": "wp-content/custom-dir" - }, - { - "step": "writeFiles", - "files": { - "wp-content/custom-dir/config.txt": { - "data": "inline", - "content": "key=value" - } + "step": "setSiteOptions", + "options": { + "blogname": "Demo Site", + "permalink_structure": "/%postname%/", + "show_on_front": "page" } }, { "step": "installPlugin", - "pluginData": "https://downloads.wordpress.org/plugin/akismet.zip" + "pluginData": "https://downloads.wordpress.org/plugin/gutenberg.zip" }, { "step": "activatePlugin", - "plugin": "akismet/akismet.php" - }, - { - "step": "installTheme", - "themeData": "https://downloads.wordpress.org/theme/twentytwentyfour.zip" + "plugin": "gutenberg/gutenberg.php" }, { - "step": "activateTheme", - "theme": "twentytwentyfour" - }, - { - "step": "setSiteOptions", - "options": { - "blogname": "My Site", - "permalink_structure": "/%postname%/" - } - }, - { - "step": "runPHP", - "code": "The schema validator returns a human-readable ValidationError instead of a generic "does not match schema" failure. Use it before handing user-authored JSON to a runner.

    + ```php -use WordPress\Blueprints\ProgressObserver; -use WordPress\Blueprints\Runner; -use WordPress\Blueprints\RunnerConfiguration; - -$observer = new ProgressObserver(); -$observer->on( - 'progress', - function ( $event ) { - echo sprintf( - "[%d%%] %s\n", - $event->progress, - $event->caption - ); - } -); +set_progress_observer( $observer ); - // ... other configuration ... -``` - -### Blueprint validation - -Validate a blueprint against the JSON schema before executing it: - -```php use WordPress\Blueprints\Validator\HumanFriendlySchemaValidator; $schema = array( - 'type' => 'object', - 'properties' => array( - 'version' => array( 'type' => 'integer' ), - 'steps' => array( 'type' => 'array' ), - ), - 'required' => array( 'version' ), + 'type' => 'object', + 'required' => array( 'version', 'steps' ), + 'properties' => array( + 'version' => array( 'type' => 'integer' ), + 'steps' => array( + 'type' => 'array', + 'items' => array( + 'type' => 'object', + 'required' => array( 'step' ), + 'properties' => array( + 'step' => array( 'type' => 'string' ), + ), + ), + ), + ), ); -$validator = new HumanFriendlySchemaValidator( $schema ); -$error = $validator->validate( json_decode( $blueprint_json ) ); +$blueprint = array( + 'version' => 2, + 'steps' => array( + array( 'pluginData' => 'https://downloads.wordpress.org/plugin/gutenberg.zip' ), + ), +); -if ( null !== $error ) { - echo 'Validation failed: ' . $error->get_message(); +$error = ( new HumanFriendlySchemaValidator( $schema ) )->validate( $blueprint ); +if ( null === $error ) { + echo "valid\n"; +} else { + echo $error->get_pretty_path() . ": " . $error->message . "\n"; } ``` -## API Reference - -### Core classes - -| Class | Purpose | -|-------|---------| -| `Runner` | Executes a blueprint. Constructor takes a `RunnerConfiguration`. Call `run()` to execute. | -| `RunnerConfiguration` | Fluent configuration builder. Key methods: `set_blueprint()`, `set_execution_mode()`, `set_target_site_root()`, `set_target_site_url()`, `set_database_engine()`, `set_database_credentials()`, `set_progress_observer()`. | -| `Runtime` | Execution context available to steps. Provides `get_target_filesystem()`, `eval_php_code_in_subprocess()`. | - -### Execution mode constants - -| Constant | Value | -|----------|-------| -| `Runner::EXECUTION_MODE_CREATE_NEW_SITE` | `'create-new-site'` | -| `Runner::EXECUTION_MODE_APPLY_TO_EXISTING_SITE` | `'apply-to-existing-site'` | - -### Data reference classes - -| Class | Purpose | -|-------|---------| -| `DataReference` | Factory class. Use `DataReference::create( $value )` to auto-detect the source type. | -| `InlineFile` | Embed file content directly. Constructor takes `array( 'filename' => '...', 'content' => '...' )`. | -| `AbsoluteLocalPath` | Reference a file by its absolute path on disk. | -| `ExecutionContextPath` | Reference a file relative to the blueprint's directory. | -| `URLReference` | Reference a file by URL (downloaded at execution time). | -| `WordPressOrgPlugin` | Reference a plugin on wordpress.org by slug. | -| `WordPressOrgTheme` | Reference a theme on wordpress.org by slug. | - -### Validation - -| Class | Purpose | -|-------|---------| -| `HumanFriendlySchemaValidator` | Validates data against a JSON Schema. Returns `null` on success or a `ValidationError` on failure. | - -## Requirements + +``` +Blueprint root["steps"][0]: Missing required field: step. +``` -- PHP 7.2+ -- No external dependencies +## The Blueprint JSON shape + +

    A blueprint is a JSON document with a version field and a steps array. Each step has a "step" discriminator and step-specific fields. This is the same shape used by WordPress Playground.

    + +
    {
    +  "version": 2,
    +  "steps": [
    +    { "step": "setSiteOptions",
    +      "options": {
    +        "blogname": "Demo Site",
    +        "permalink_structure": "/%postname%/"
    +      } },
    +    { "step": "installPlugin",
    +      "pluginData": "https://downloads.wordpress.org/plugin/gutenberg.zip" },
    +    { "step": "activatePlugin",
    +      "plugin": "gutenberg/gutenberg.php" }
    +  ]
    +}
    diff --git a/components/ByteStream/README.md b/components/ByteStream/README.md index a9bede580..0a40453c8 100644 --- a/components/ByteStream/README.md +++ b/components/ByteStream/README.md @@ -1,260 +1,203 @@ -# ByteStream +--- +slug: bytestream +title: ByteStream +install: wp-php-toolkit/bytestream - -> 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/bytestream.html](https://wordpress.github.io/php-toolkit/reference/bytestream.html) -> Open the page to edit each snippet in your browser and run it in WordPress Playground. - +see_also: filesystem | Filesystem | Back file reads and writes with the same stream primitives. +see_also: zip | Zip | Read and write archive entries one stream at a time. +see_also: httpclient | HttpClient | Process request and response bodies incrementally. +--- -Composable streaming primitives for reading, writing, and transforming byte data in pure PHP. ByteStream provides a pull-based model where you request bytes from a source, peek at or consume them, and optionally transform them through filters like compression or checksums -- all without loading entire files into memory. +Composable streaming primitives for reading, writing, transforming, hashing, and compressing byte data. Pull/peek/consume semantics let parsers backtrack without copying, and deflate, inflate, and checksum filters snap together like Lego. -## Installation +## Why this exists -```bash -composer require wp-php-toolkit/bytestream -``` - -## Quick Start - -```php -use WordPress\ByteStream\ReadStream\FileReadStream; - -// Read a file in chunks -$reader = FileReadStream::from_path( '/path/to/file.txt' ); -while ( ! $reader->reached_end_of_data() ) { - $available = $reader->pull( 1024 ); - $chunk = $reader->consume( $available ); - // Process $chunk... -} -$reader->close_reading(); -``` +

    PHP's native streams are powerful but inconsistent. fread on a socket may return short reads with no warning; stream_filter_append is awkward to compose; gzip helpers and file handles expose different APIs. The ByteStream component normalizes these behind one small interface — pull / peek / consume — so a parser, a hash function, and a deflate filter all see the same shape.

    -## Usage +

    The split between pull (buffer up to N bytes) and consume (advance past N bytes) is the secret. Parsers can peek ahead to detect a record boundary and decide whether to consume, without copying or allocating.

    -### Reading Files +## Read a file in chunks -`FileReadStream` opens a file and exposes it through the pull/consume model. Use `pull()` to buffer bytes, `peek()` to inspect them without advancing, and `consume()` to read and advance the position. +

    The canonical loop. pull(N) reads up to N bytes from the underlying source into an internal buffer and returns how many ended up there; consume(N) reads N bytes from that buffer and advances past them. The buffer never grows beyond the chunk size you ask for.

    + ```php -use WordPress\ByteStream\ReadStream\FileReadStream; - -$reader = FileReadStream::from_path( '/path/to/data.bin' ); - -// Pull up to 100 bytes into the internal buffer -$reader->pull( 100 ); - -// Peek at the first 10 bytes without consuming them -$header = $reader->peek( 10 ); +consume( 10 ); - -// Read the current position -$offset = $reader->tell(); // 10 - -// Seek to a specific offset -$reader->seek( 0 ); +use WordPress\ByteStream\ReadStream\FileReadStream; -// Read all remaining bytes at once -$rest = $reader->consume_all(); +$path = tempnam( sys_get_temp_dir(), 'demo' ); +file_put_contents( $path, str_repeat( "log line\n", 200 ) ); +$reader = FileReadStream::from_path( $path ); +$total = 0; +while ( ! $reader->reached_end_of_data() ) { + $n = $reader->pull( 256 ); + if ( 0 === $n ) break; + $total += strlen( $reader->consume( $n ) ); +} $reader->close_reading(); +echo "Read {$total} bytes in 256-byte chunks.\n"; ``` -You can also create a `FileReadStream` from an existing resource handle: - -```php -$handle = fopen( '/path/to/file.txt', 'r' ); -$reader = FileReadStream::from_resource( $handle, filesize( '/path/to/file.txt' ) ); + ``` +Read 1800 bytes in 256-byte chunks. +``` + +## MemoryPipe as write-then-read buffer -### In-Memory Streams with MemoryPipe +

    MemoryPipe is bidirectional: you append_bytes() as a writer and pull/consume as a reader. Easiest way to wire one component's output into another's input.

    -`MemoryPipe` holds data in memory and supports both reading and writing. It is useful for testing, for wrapping string data in the stream interface, or for piping data between components. +

    Gotcha: A producer must call close_writing() when done — otherwise the consumer eventually throws NotEnoughDataException instead of seeing EOF.

    + ```php -use WordPress\ByteStream\MemoryPipe; +pull( 5 ); -echo $pipe->consume( 5 ); // "Hello" +use WordPress\ByteStream\MemoryPipe; -// Use as a write-then-read pipe -$pipe = new MemoryPipe( null, 1024 ); // Expected length of 1024 -$pipe->append_bytes( 'chunk one ' ); -$pipe->append_bytes( 'chunk two' ); +$pipe = new MemoryPipe(); +$pipe->append_bytes( "first chunk\n" ); +$pipe->append_bytes( "second chunk\n" ); +$pipe->append_bytes( "third chunk\n" ); $pipe->close_writing(); -echo $pipe->consume_all(); // "chunk one chunk two" +while ( ! $pipe->reached_end_of_data() ) { + $n = $pipe->pull( 1024 ); + if ( 0 === $n ) break; + echo "got: " . $pipe->consume( $n ); +} ``` -### Writing Files - -`FileWriteStream` appends data to a file. It supports truncating or appending to existing files. - -```php -use WordPress\ByteStream\WriteStream\FileWriteStream; - -// Truncate and write -$writer = FileWriteStream::from_path( '/path/to/output.txt', 'truncate' ); -$writer->append_bytes( 'First line' ); -$writer->append_bytes( "\nSecond line" ); -$writer->close_writing(); - -// Append to existing file -$writer = FileWriteStream::from_path( '/path/to/output.txt', 'append' ); -$writer->append_bytes( "\nThird line" ); -$writer->close_writing(); + ``` - -### Reading and Writing the Same File - -`FileReadWriteStream` provides both read and write access to a single file. Writes always append to the end while reads track their own position independently. - -```php -use WordPress\ByteStream\FileReadWriteStream; - -$stream = FileReadWriteStream::from_path( '/tmp/buffer.bin', true ); -$stream->append_bytes( 'Hello' ); -$stream->append_bytes( ' World' ); - -// Read back from the beginning -$stream->pull( 11 ); -echo $stream->consume( 11 ); // "Hello World" - -$stream->close_writing(); -$stream->close_reading(); +got: first chunk +second chunk +third chunk ``` -### Compression and Decompression +## Compress on the way in, decompress on the way out -`DeflateReadStream` compresses data as you read it, and `InflateReadStream` decompresses. They wrap any `ByteReadStream` and produce a new stream of transformed bytes. +

    Wrap a stream in DeflateReadStream to get compressed bytes out; wrap it in InflateReadStream to get decompressed bytes out. Both are full ByteReadStream implementations, so they nest into anything else that takes a stream.

    + ```php +close_writing(); +$deflated = new DeflateReadStream( $src, ZLIB_ENCODING_DEFLATE ); $compressed = $deflated->consume_all(); -// Decompress -$compressed_source = new MemoryPipe( $compressed ); -$inflated = new InflateReadStream( $compressed_source, ZLIB_ENCODING_DEFLATE ); -echo $inflated->consume_all(); // "The quick brown fox jumps over the lazy dog." -``` - -### Transforming Streams with Filters - -`TransformedReadStream` and `TransformedWriteStream` apply a chain of `ByteTransformer` filters as data flows through the stream. Built-in transformers include `ChecksumTransformer`, `DeflateTransformer`, and `InflateTransformer`. - -```php -use WordPress\ByteStream\ReadStream\FileReadStream; -use WordPress\ByteStream\ReadStream\TransformedReadStream; -use WordPress\ByteStream\ByteTransformer\ChecksumTransformer; - -// Read a file and compute its SHA-1 hash as you go -$checksum = new ChecksumTransformer( 'sha1' ); -$reader = FileReadStream::from_path( '/path/to/file.txt' ); -$stream = new TransformedReadStream( $reader, array( 'checksum' => $checksum ) ); +$src2 = new MemoryPipe( $compressed ); +$src2->close_writing(); +$inflated = new InflateReadStream( $src2, ZLIB_ENCODING_DEFLATE ); +$round = $inflated->consume_all(); -$contents = $stream->consume_all(); -echo $stream['checksum']->get_hash(); // SHA-1 hex digest +printf( "original : %d bytes\n", strlen( $original ) ); +printf( "deflated : %d bytes (%.1f%%)\n", strlen( $compressed ), 100 * strlen( $compressed ) / strlen( $original ) ); +printf( "round-trip: %s\n", $round === $original ? 'OK' : 'BROKEN' ); ``` -Compress data as you write it: - -```php -use WordPress\ByteStream\WriteStream\FileWriteStream; -use WordPress\ByteStream\WriteStream\TransformedWriteStream; -use WordPress\ByteStream\ByteTransformer\DeflateTransformer; - -$file_writer = FileWriteStream::from_path( '/path/to/output.deflate', 'truncate' ); -$writer = new TransformedWriteStream( - $file_writer, - array( new DeflateTransformer( ZLIB_ENCODING_DEFLATE ) ) -); -$writer->append_bytes( 'Data to compress...' ); -$writer->close_writing(); -$file_writer->close_writing(); + ``` - -### Limiting Read Length - -`LimitedByteReadStream` restricts reading to a fixed number of bytes from a larger stream. This is useful for reading structured binary formats where you know the length of each section. - -```php -use WordPress\ByteStream\ReadStream\FileReadStream; -use WordPress\ByteStream\ReadStream\LimitedByteReadStream; - -$reader = FileReadStream::from_path( '/path/to/archive.bin' ); - -// Read only the first 256 bytes -$header_reader = new LimitedByteReadStream( $reader, 256 ); -$header = $header_reader->consume_all(); -echo $header_reader->length(); // 256 +original : 1050 bytes +deflated : 45 bytes (4.3%) +round-trip: OK ``` -### Pull Modes +## Line-by-line reads from a chunked source -The `pull()` method supports two modes that control how bytes are buffered: +

    Reading text by line means handling chunk boundaries that fall mid-line. Keep the trailing partial line and prepend it to the next pull. The rest of the loop pretends the data was always whole.

    + ```php -use WordPress\ByteStream\ReadStream\ByteReadStream; - -// PULL_NO_MORE_THAN (default): pull up to N bytes, may return fewer -$available = $reader->pull( 1024 ); -$chunk = $reader->consume( $available ); +pull( 100, ByteReadStream::PULL_EXACTLY ); -$chunk = $reader->consume( 100 ); -``` +use WordPress\ByteStream\MemoryPipe; -## API Reference +$pipe = new MemoryPipe(); +$pipe->append_bytes( "alpha\nbravo\ncharl" ); +$pipe->append_bytes( "ie\ndelta\necho\n" ); +$pipe->close_writing(); -### Interfaces +$tail = ''; +$count = 0; +while ( ! $pipe->reached_end_of_data() ) { + $n = $pipe->pull( 8 ); + if ( 0 === $n ) break; + $buf = $tail . $pipe->consume( $n ); + $lines = explode( "\n", $buf ); + $tail = array_pop( $lines ); + foreach ( $lines as $line ) { + printf( "[%d] %s\n", ++$count, $line ); + } +} +if ( '' !== $tail ) { + printf( "[%d] %s\n", ++$count, $tail ); +} +``` -| Interface | Methods | -|---|---| -| `ByteReadStream` | `pull()`, `peek()`, `consume()`, `consume_all()`, `seek()`, `tell()`, `length()`, `reached_end_of_data()`, `close_reading()` | -| `ByteWriteStream` | `append_bytes()`, `close_writing()` | -| `BytePipe` | Combines `ByteReadStream` and `ByteWriteStream` | -| `ByteTransformer` | `filter_bytes()`, `flush()` | + +``` +[1] alpha +[2] bravo +[3] charlie +[4] delta +[5] echo +``` -### Read Stream Classes +## Limit a stream to a fixed window -| Class | Description | -|---|---| -| `FileReadStream` | Reads from a file via `from_path()` or `from_resource()` | -| `InflateReadStream` | Decompresses a wrapped `ByteReadStream` | -| `DeflateReadStream` | Compresses a wrapped `ByteReadStream` | -| `TransformedReadStream` | Applies a chain of `ByteTransformer` filters while reading | -| `LimitedByteReadStream` | Limits reading to a fixed byte count from a larger stream | +

    LimitedByteReadStream exposes only the next N bytes of an underlying stream as if those were the entire stream. This is how the ZIP decoder hands you the body of one entry without letting you read into the next.

    -### Write Stream Classes + +```php +close_writing(); -| Class | Description | -|---|---| -| `MemoryPipe` | In-memory read/write buffer (implements `BytePipe`) | -| `FileReadWriteStream` | File-backed read/write stream (implements `BytePipe`) | -| `ChecksumTransformer` | Computes a hash (SHA-1, MD5, etc.) as bytes flow through | -| `DeflateTransformer` | Compresses bytes as a write-side transformer | -| `InflateTransformer` | Decompresses bytes as a write-side transformer | +$source->pull( 10 ); +$source->consume( 10 ); -## Requirements +$body = new LimitedByteReadStream( $source, 16 ); +echo "body sees: " . $body->consume_all() . "\n"; +echo "remaining in source: " . $source->consume_all() . "\n"; +``` -- PHP 7.2+ -- No external dependencies + +``` +body sees: BODY:hello there +remaining in source: |FOOTER:done +``` diff --git a/components/CLI/README.md b/components/CLI/README.md index 6bd7483e3..017eeaf18 100644 --- a/components/CLI/README.md +++ b/components/CLI/README.md @@ -1,163 +1,232 @@ -# CLI +--- +slug: cli +title: CLI +install: wp-php-toolkit/cli - -> 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/cli.html](https://wordpress.github.io/php-toolkit/reference/cli.html) -> Open the page to edit each snippet in your browser and run it in WordPress Playground. - +see_also: filesystem | Filesystem | Keep command behavior testable with in-memory storage. +see_also: blueprints | Blueprints | Build repeatable site setup commands around parsed options. +see_also: httpserver | HttpServer | Add a local web UI to a CLI workflow. +--- -A POSIX-style command-line argument parser for PHP. It handles long options (`--verbose`), short options (`-v`), bundled short options (`-abc`), inline values (`--port=8080`, `-p=8080`), and positional arguments -- all in a single static method call with no external dependencies. +POSIX-style argument parser. Long options, short bundles, inline values, positional args — one static call. -## Installation +## Why this exists -```bash -composer require wp-php-toolkit/cli -``` +

    Real CLI tools in PHP usually mean either pulling in symfony/console (and the transitive dependencies that come with it) or hand-rolling argv parsing that breaks the first time someone writes -vvv or --port=8080. The toolkit's CLI class is one static method, no dependencies, and handles the POSIX shapes you actually see.

    + +## Parse a single flag -## Quick Start +

    The smallest useful invocation: one boolean flag, one positional. Each option is a four-tuple of [ short, has_value, default, description ].

    + ```php + array( 'o', true, null, 'Output file path' ), - 'force' => array( 'f', false, false, 'Overwrite existing files' ), + 'verbose' => array( 'v', false, false, 'Enable verbose output' ), ); -$argv = array( '--output', '/tmp/result.txt', '-f', 'input.json' ); - -list( $positionals, $options ) = CLI::parse_command_args_and_options( $argv, $option_defs ); +list( $positionals, $options ) = CLI::parse_command_args_and_options( + array( '-v', 'input.txt' ), + $option_defs +); -// $positionals = array( 'input.json' ) -// $options = array( 'output' => '/tmp/result.txt', 'force' => true ) +echo "verbose: " . ( $options['verbose'] ? 'yes' : 'no' ) . "\n"; +echo "input: " . $positionals[0] . "\n"; ``` -## Usage + +``` +verbose: yes +input: input.txt +``` -### Defining Options +## Mix values, flags, and bundles -Each option is defined as an entry in an associative array. The key is the long option name, and the value is a four-element array: +

    The parser accepts --port 8080, --port=8080, -p 8080, and -p=8080. It also expands bundled boolean shorts such as -afv.

    + ```php -$option_defs = array( - // 'long-name' => array( short, hasValue, default, description ) - 'site-url' => array( 'u', true, null, 'Public site URL' ), - 'site-path' => array( null, true, null, 'Target directory (no short form)' ), - 'help' => array( 'h', false, false, 'Show help message' ), - 'verbose' => array( 'v', false, false, 'Enable verbose output' ), -); -``` - -| Element | Type | Meaning | -|-----------|----------------|------------------------------------------------------| -| `short` | `string\|null` | Single-character short alias, or `null` for none | -| `hasValue`| `bool` | `true` if the option takes a value, `false` for flags | -| `default` | `mixed` | Default value when the option is not provided | -| `description` | `string` | Human-readable description (for help text) | - -### Long Options + array( 'p', true, '3000', 'Server port' ), + 'all' => array( 'a', false, false, 'Process everything' ), + 'force' => array( 'f', false, false, 'Overwrite existing files' ), + 'verbose' => array( 'v', false, false, 'Verbose output' ), + 'output' => array( 'o', true, null, 'Output path' ), + 'port' => array( 'p', true, '3000', 'Server port' ), ); -// These are equivalent: -// --port=8080 -// --port 8080 - -$argv = array( '--port=8080' ); +$argv = array( '-afv', '--port=8080', '-o', '/tmp/result.txt', 'input.json' ); list( $positionals, $options ) = CLI::parse_command_args_and_options( $argv, $option_defs ); -// $options['port'] === '8080' + +echo "input: " . $positionals[0] . "\n"; +echo "flags: " . implode( ', ', array_keys( array_filter( array( + 'all' => $options['all'], + 'force' => $options['force'], + 'verbose' => $options['verbose'], +) ) ) ) . "\n"; +echo "output: " . $options['output'] . "\n"; +echo "port: " . $options['port'] . "\n"; +``` + + +``` +input: input.json +flags: all, force, verbose +output: /tmp/.txt +port: 8080 ``` -### Short Options +## Validate required options -Short options work the same way as long options. Boolean flags can be bundled: +

    The parser fills in defaults but never enforces "required". Check for null after parsing — full control over the error message.

    + ```php + array( 'a', false, false, 'Process all items' ), - 'force' => array( 'f', false, false, 'Force overwrite' ), - 'verbose' => array( 'v', false, false, 'Verbose output' ), - 'output' => array( 'o', true, null, 'Output path' ), + 'site-url' => array( 'u', true, null, 'Public site URL (required)' ), + 'site-path' => array( null, true, null, 'Target directory (required)' ), ); -// Bundle boolean flags: -afv is the same as -a -f -v -$argv = array( '-afv' ); -list( $positionals, $options ) = CLI::parse_command_args_and_options( $argv, $option_defs ); -// $options['all'] === true -// $options['force'] === true -// $options['verbose'] === true +$argv = array( '--site-url', 'https://mysite.test' ); -// A value-bearing short option can appear at the end of a bundle: -$argv = array( '-afo', '/tmp/out.txt' ); -list( $positionals, $options ) = CLI::parse_command_args_and_options( $argv, $option_defs ); -// $options['all'] === true -// $options['force'] === true -// $options['output'] === '/tmp/out.txt' +try { + list( , $options ) = CLI::parse_command_args_and_options( $argv, $option_defs ); + foreach ( array( 'site-url', 'site-path' ) as $name ) { + if ( null === $options[ $name ] ) { + throw new RuntimeException( "Missing required option --{$name}" ); + } + } + echo "All good.\n"; +} catch ( Exception $e ) { + echo "error: " . $e->getMessage() . "\n"; +} ``` -### Positional Arguments - -Any argument that is not an option or an option value is collected as a positional argument: - -```php -$option_defs = array( - 'help' => array( 'h', false, false, 'Show help' ), -); - -$argv = array( 'blueprint.json', '-h', 'extra-arg' ); -list( $positionals, $options ) = CLI::parse_command_args_and_options( $argv, $option_defs ); -// $positionals = array( 'blueprint.json', 'extra-arg' ) -// $options['help'] === true + +``` +error: Missing required option --site-path ``` -### Error Handling +## Generate --help from definitions -The parser throws `InvalidArgumentException` for unknown options or missing required values: +

    Because each option carries its own description, you can render help text by walking the same definitions you parse with. No second source of truth.

    + ```php -use InvalidArgumentException; + array( 'p', true, null, 'Server port' ), + 'output' => array( 'o', true, null, 'Write result to FILE' ), + 'force' => array( 'f', false, false, 'Overwrite existing files' ), + 'verbose' => array( 'v', false, false, 'Verbose output' ), + 'help' => array( 'h', false, false, 'Show this help and exit' ), ); -try { - $argv = array( '--unknown' ); - CLI::parse_command_args_and_options( $argv, $option_defs ); -} catch ( InvalidArgumentException $e ) { - // "Unknown option --unknown" +function render_help( array $defs ) { + echo "Usage: mytool [options] \n\nOptions:\n"; + foreach ( $defs as $long => $def ) { + list( $short, $has_value, $default, $desc ) = $def; + $flag = ( $short ? "-{$short}, " : ' ' ) . "--{$long}"; + if ( $has_value ) $flag .= '=VALUE'; + echo sprintf( " %-28s %s\n", $flag, $desc ); + } } -try { - $argv = array( '--port' ); // missing value - CLI::parse_command_args_and_options( $argv, $option_defs ); -} catch ( InvalidArgumentException $e ) { - // "Option --port requires a value" -} +list( , $options ) = CLI::parse_command_args_and_options( array( '-h' ), $option_defs ); +if ( $options['help'] ) render_help( $option_defs ); +``` + + ``` +Usage: mytool [options] -## API Reference +Options: + -o, --output=VALUE Write result to FILE + -f, --force Overwrite existing files + -v, --verbose Verbose output + -h, --help Show this help and exit +``` -### `CLI` (class) +## Git-style subcommands -| Method | Description | -|--------|-------------| -| `CLI::parse_command_args_and_options( array $argv, array $option_defs ): array` | Parses CLI arguments and returns `array( $positionals, $options )`. | +

    To build a tool with subcommands like mytool deploy, peel the first positional off argv, dispatch, and parse the rest with a per-command option set.

    -**Parameters:** + +```php + array( + 'env' => array( 'e', true, 'staging', 'Target environment' ), + 'dry-run' => array( 'n', false, false, 'Preview without applying' ), + ), + 'rollback' => array( + 'to' => array( 't', true, null, 'Revision to roll back to' ), + ), +); -**Throws:** `InvalidArgumentException` for unknown options or missing values. +function run( array $argv, array $commands ) { + if ( empty( $argv ) ) { + echo "Usage: mytool [options]\nCommands: " . implode( ', ', array_keys( $commands ) ) . "\n"; + return; + } + $command = array_shift( $argv ); + if ( ! isset( $commands[ $command ] ) ) { + echo "Unknown command: {$command}\n"; + return; + } + list( $positionals, $options ) = CLI::parse_command_args_and_options( $argv, $commands[ $command ] ); + echo "command={$command}\n"; + echo "options: " . json_encode( $options ) . "\n"; + echo "positionals: " . json_encode( $positionals ) . "\n"; +} -## Requirements +run( array( 'deploy', '--env=production', '-n', 'web-01', 'web-02' ), $commands ); +echo "---\n"; +run( array( 'rollback', '-t', 'abc123' ), $commands ); +``` -- PHP 7.2+ -- No external dependencies + +``` +command=deploy +options: {"env":"production","dry-run":true} +positionals: ["web-01","web-02"] +--- +command=rollback +options: {"to":"abc123"} +positionals: [] +``` diff --git a/components/CORSProxy/README.md b/components/CORSProxy/README.md index 51f58e56a..770d2d164 100644 --- a/components/CORSProxy/README.md +++ b/components/CORSProxy/README.md @@ -1,187 +1,151 @@ -# CORSProxy +--- +slug: corsproxy +title: CORSProxy +install: wp-php-toolkit/corsproxy - -> 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/corsproxy.html](https://wordpress.github.io/php-toolkit/reference/corsproxy.html) -> Open the page to edit each snippet in your browser and run it in WordPress Playground. - +see_also: httpclient | HttpClient | Fetch upstream responses from PHP when browser CORS blocks direct access. +see_also: httpserver | HttpServer | Understand the local-server shape before deploying a proxy endpoint. +--- -A PHP CORS proxy that lets browser-based JavaScript make cross-origin requests to external services. Built for WordPress Playground to bridge `fetch()` calls to git servers and other APIs that don't set CORS headers. The proxy streams data bidirectionally, blocks requests to private IP ranges, filters sensitive headers, and enforces size limits -- all without external dependencies. +A small PHP CORS proxy intended for browser-side code that needs to reach servers without CORS headers. -## Installation +## Why this exists -``` -composer require wp-php-toolkit/corsproxy -``` - -## Quick Start +

    A Playground-style browser tool reads https://api.github.com/repos/WordPress/php-toolkit, a plugin ZIP from downloads.wordpress.org, or a raw fixture from GitHub. The browser blocks the response when the upstream server does not send the required CORS headers, even though PHP can fetch the same public URL server-side.

    -Deploy `cors-proxy.php` behind a web server. Clients make requests through the proxy by appending the target URL to the proxy's path: +

    The CORSProxy component is that server-side bridge. It accepts a target URL, fetches it from PHP, and returns a browser-readable response. Because an open proxy is a security and abuse risk, real deployments should add host allowlists, rate limits, header controls, and private-network protections appropriate to their environment.

    -``` -GET https://your-server.com/cors-proxy.php/https://api.example.com/data -``` +## Run the proxy locally -The proxy fetches `https://api.example.com/data`, streams the response back with CORS headers attached, and the browser's same-origin policy is satisfied. +

    Run on your machine: the proxy needs to listen on a port. Start PHP's built-in server and request any HTTPS URL through it.

    -## Usage +
    PLAYGROUND_CORS_PROXY_DISABLE_RATE_LIMIT=1 \
    +  php -S 127.0.0.1:5263 vendor/wp-php-toolkit/corsproxy/cors-proxy.php
     
    -### Deployment
    +# In another terminal:
    +curl -s "http://127.0.0.1:5263/cors-proxy.php/https://api.github.com/repos/WordPress/php-toolkit" | head
    +
    -Place `cors-proxy.php` and `cors-proxy-functions.php` in a web-accessible directory. The proxy works with Apache, Nginx, or PHP's built-in development server. +## Production rate limiting -For local development: +

    Drop a cors-proxy-config.php next to cors-proxy.php. If that file defines a playground_cors_proxy_maybe_rate_limit() function, the proxy calls it before forwarding any request — your one chance to reject early. Without the file, the proxy applies its default rate limiter, which is fine for development but should be replaced for any deployment that gets real traffic.

    -```bash -php -S 127.0.0.1:5263 cors-proxy.php -# Then request: http://127.0.0.1:5263/cors-proxy.php/https://w.org/ -``` - -### Rate limiting - -The proxy refuses to run without rate limiting configured. You must do one of the following: - -1. **Define a rate-limiting function** in a `cors-proxy-config.php` file placed alongside `cors-proxy.php`: +

    This example uses a per-IP token bucket stored on disk. Replace with Redis or memcached for multi-host deployments.

    + ```php $now - $window; + } ); + + if ( count( $hits ) >= $max_req ) { + header( 'Retry-After: ' . $window ); + http_response_code( 429 ); + echo 'Rate limit exceeded'; + exit; + } + + $hits[] = $now; + file_put_contents( $bucket, json_encode( array_values( $hits ) ) ); } -``` -2. **Explicitly disable rate limiting** (development only): - -```php -Out of the box the proxy will fetch any public URL. Most real deployments want a fixed list of upstreams — GitHub, Packagist, wp.org. Both the rate-limit logic and the allowlist live in the same hook, since cors-proxy.php only calls playground_cors_proxy_maybe_rate_limit() once. The example below shows just the allowlist concern; in practice you stack both in one function inside cors-proxy-config.php.

    + ```php -// These are all blocked: -is_private_ip( '127.0.0.1' ); // true - loopback -is_private_ip( '192.168.1.1' ); // true - RFC 1918 -is_private_ip( '10.0.0.1' ); // true - RFC 1918 -is_private_ip( '172.16.0.1' ); // true - RFC 1918 -is_private_ip( '::1' ); // true - IPv6 loopback -is_private_ip( 'fe80::' ); // true - IPv6 link-local - -// Public IPs are allowed: -is_private_ip( '8.8.8.8' ); // false -is_private_ip( '204.79.197.200' ); // false -``` - -IPv4 and IPv6 private ranges are both covered, including loopback, link-local, carrier-grade NAT, documentation ranges, and multicast addresses. - -**Header filtering.** The proxy strips `Cookie` and `Host` headers from forwarded requests. The `Authorization` header requires explicit opt-in through the `X-Cors-Proxy-Allowed-Request-Headers` request header: - -```php -$filtered = filter_headers_by_name( - array( - 'Accept' => 'application/json', - 'Content-Type' => 'application/json', - 'Cookie' => 'session=abc', - 'Host' => 'example.com', - 'Authorization' => 'Bearer token123', - ), - array( 'Cookie', 'Host' ), // always stripped - array( 'Authorization' ) // requires opt-in -); -// Result: ['Accept' => 'application/json', 'Content-Type' => 'application/json'] -// Authorization was stripped because the client did not send -// X-Cors-Proxy-Allowed-Request-Headers: Authorization -``` - -**URL validation.** Target URLs are validated for scheme (only `http` and `https`), checked for embedded credentials, and verified not to point back at the proxy server itself. - -### Redirect handling - -When the target server returns a redirect, the proxy rewrites the `Location` header so the client follows the redirect back through the proxy: - -```php -$rewritten = rewrite_relative_redirect( - 'https://w.org/hosting', // original request - '/hosting/', // redirect location - 'https://cors.example.com/proxy.php' // proxy URL -); -// Result: "https://cors.example.com/proxy.php?https://w.org/hosting/" -``` - -This works for both relative and absolute redirects. - -### Extracting the target URL + '/https://example.com' ) ); -// Returns: "https://example.com" - -// Query string style: -// GET /cors-proxy.php?https://example.com -get_target_url( array( 'QUERY_STRING' => 'https://example.com' ) ); -// Returns: "https://example.com" +echo "Allowlist config active.\n"; ``` -### CORS headers - -CORS response headers are added for requests originating from: +## Browser-side fetch through the proxy -- `https://playground.wordpress.net` (when the proxy is hosted elsewhere) -- `localhost` or `127.0.0.1` (for local development) +

    Once deployed, the client side is just fetch() with the proxy URL. Drop this into any HTML page.

    -The proxy responds to `OPTIONS` preflight requests with appropriate `Access-Control-Allow-*` headers. +
    const PROXY = "https://cors.example.com/cors-proxy.php";
     
    -## API Reference
    +async function viaProxy(url, init = {}) {
    +  const res = await fetch(`${PROXY}/${url}`, {
    +    ...init,
    +    headers: {
    +      ...(init.headers || {}),
    +      "X-Cors-Proxy-Allowed-Request-Headers": "Authorization",
    +    },
    +  });
    +  if (!res.ok) throw new Error(`Proxy returned ${res.status}`);
    +  return res;
    +}
     
    -### Functions
    +const repo = await viaProxy("https://api.github.com/repos/WordPress/php-toolkit").then(r => r.json());
    +console.log(repo.full_name, repo.stargazers_count);
    +
    -| Function | Purpose | -|----------|---------| -| `get_target_url( $server_data )` | Extracts the target URL from `$_SERVER` (or a custom array). Returns the URL string or `false`. | -| `get_current_script_uri( $target_url, $request_uri )` | Returns the proxy's own URI prefix (everything before the target URL in the request). | -| `url_validate_and_resolve( $url, $resolve_function )` | Validates a URL (scheme, no credentials, no private IPs) and resolves the hostname. Returns `array( 'host' => ..., 'ip' => ... )` or throws `CorsProxyException`. | -| `is_private_ip( $ip )` | Returns `true` if the IP address falls within any private, loopback, link-local, or reserved range. Supports both IPv4 and IPv6. | -| `filter_headers_by_name( $headers, $disallowed, $opt_in )` | Filters an associative array of headers, removing disallowed ones and enforcing opt-in for sensitive headers. | -| `rewrite_relative_redirect( $request_url, $redirect_location, $proxy_url )` | Rewrites a redirect `Location` to route back through the proxy. | -| `should_respond_with_cors_headers( $host, $origin )` | Returns `true` if the given origin should receive CORS response headers. | +## Deploy behind nginx -### Classes +

    The proxy is a single PHP script — any SAPI works. nginx + php-fpm is a common production setup. PATH_INFO is what the proxy reads to learn the target URL.

    -| Class | Purpose | -|-------|---------| -| `IpUtils` | Static methods for private IP detection: `isPrivateIp( $ip )`. Covers RFC 1918, RFC 4193, loopback, link-local, carrier-grade NAT, and more. | -| `CorsProxyException` | Thrown when URL validation fails (invalid scheme, private IP, unresolvable hostname, etc.). | +
    server {
    +  listen 443 ssl http2;
    +  server_name cors.example.com;
     
    -## Requirements
    +  root /var/www/cors-proxy;
    +  index cors-proxy.php;
     
    -- PHP 7.2+
    -- `curl` extension (for proxying HTTP requests)
    -- No other external dependencies
    +  location ~ ^/cors-proxy\.php(/.*)?$ {
    +    fastcgi_pass unix:/run/php/php8.1-fpm.sock;
    +    fastcgi_split_path_info ^(.+\.php)(/.*)$;
    +    fastcgi_param SCRIPT_FILENAME $document_root/cors-proxy.php;
    +    fastcgi_param PATH_INFO $fastcgi_path_info;
    +    include fastcgi_params;
    +  }
    +}
    +
    diff --git a/components/DataLiberation/README.md b/components/DataLiberation/README.md index a41b34831..6dd8b02d8 100644 --- a/components/DataLiberation/README.md +++ b/components/DataLiberation/README.md @@ -1,326 +1,282 @@ -# DataLiberation +--- +slug: dataliberation +title: DataLiberation +install: wp-php-toolkit/data-liberation - -> 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/dataliberation.html](https://wordpress.github.io/php-toolkit/reference/dataliberation.html) -> Open the page to edit each snippet in your browser and run it in WordPress Playground. - +see_also: ../learn/03-importing-content.html | Tutorial — Markdown to WXR | The chapter that walks through importing a folder of Markdown files into WordPress via the toolkit. +see_also: markdown | Markdown | Use Markdown as a source or destination format. +see_also: blockparser | BlockParser | Analyze serialized blocks inside post content. +see_also: httpclient | HttpClient | Download media and remote source data while importing. +--- -Streaming data import and export for WordPress. Reads and writes WordPress content in multiple formats -- WXR (WordPress eXtended RSS), SQL dumps, block markup, and more -- without loading everything into memory. Designed for migrating content between WordPress sites, converting between formats, and processing large exports that would otherwise exhaust PHP's memory limits. +Streaming WordPress import/export. WXR, SQL, block markup — without loading whole datasets into memory. -## Installation +## Why this exists -``` -composer require wp-php-toolkit/data-liberation -``` - -## Quick Start - -Export a WordPress post to WXR format: - -```php -use WordPress\ByteStream\MemoryPipe; -use WordPress\DataLiberation\EntityWriter\WXRWriter; -use WordPress\DataLiberation\ImportEntity; +

    WordPress content should be portable, but real migrations cross several formats. A site export might arrive as WXR, a Markdown folder, or entities from another CMS. URLs can hide in block attributes, HTML, CSS, feeds, GUIDs, and post meta. Importers must also resume after a failed media download or upload.

    -$output = new MemoryPipe(); -$writer = new WXRWriter( $output ); - -$post = new ImportEntity( 'post', array( - 'post_title' => 'Hello World', - 'post_date' => '2024-01-15', - 'guid' => 'https://example.com/?p=1', - 'content' => '

    Welcome to my site.

    ', - 'excerpt' => 'A short summary.', - 'post_id' => '1', - 'post_name' => 'hello-world', - 'status' => 'publish', - 'post_type' => 'post', -) ); - -$writer->append_entity( $post ); -$writer->finalize(); -$writer->close_writing(); -$output->close_writing(); +

    The DataLiberation component streams WordPress-shaped data through readers, transformers, and writers. It models posts, terms, comments, attachments, and metadata as ImportEntity objects, then lets a pipeline rewrite each entity without loading the full export into memory.

    -echo $output->consume_all(); -// Outputs a complete WXR XML document with the post. -``` +

    The API reflects specific migration bugs: relative URLs in known block attributes, URLs inside inline CSS, self-closing block comments that must keep their shape, and origin-only URLs whose trailing slash style should not change during a rewrite.

    -## Usage +

    Reach for it when the job combines formats: build WXR from another CMS, rewrite a staging export for production, frontload remote assets, or compose Markdown, XML, HTML, CSS, and URL rewriting into one pipeline.

    -### Writing WXR exports +## Write a WXR file in five lines -`WXRWriter` generates WordPress eXtended RSS (WXR) XML files. You feed it entities one at a time -- posts, metadata, terms, and comments -- and it produces valid WXR output. Entities must be appended in logical order: metadata, terms, and comments belong to the most recently appended post. +

    Stream a single post into a WXR document via WXRWriter. The writer holds no buffer beyond what is needed to close currently-open tags, so memory stays flat regardless of input size.

    + ```php +append_entity( new ImportEntity( 'post', array( - 'post_title' => 'My Article', - 'post_date' => '2024-03-01', - 'guid' => 'https://example.com/?p=42', - 'content' => '

    Article body.

    ', - 'post_id' => '42', - 'post_name' => 'my-article', - 'status' => 'publish', - 'post_type' => 'post', - 'comment_status' => 'open', -) ) ); - -// Attach metadata to that post -$writer->append_entity( new ImportEntity( 'post_meta', array( - 'meta_key' => '_thumbnail_id', - 'meta_value' => '99', -) ) ); - -// Attach a term -$writer->append_entity( new ImportEntity( 'term', array( - 'term_id' => '5', - 'taxonomy' => 'category', - 'slug' => 'tutorials', - 'parent' => '0', + 'post_title' => 'Hello', + 'content' => 'World.', + 'post_id' => '1', + 'status' => 'publish', ) ) ); - -// Attach a comment -$writer->append_entity( new ImportEntity( 'comment', array( - 'comment_id' => '1', - 'comment_author' => 'Jane', - 'comment_content' => 'Great post!', - 'comment_date' => '2024-03-02', - 'comment_approved' => '1', -) ) ); - $writer->finalize(); $writer->close_writing(); -$output->close_writing(); -``` - -The writer supports pausing and resuming via a reentrancy cursor. This lets you split large exports across multiple PHP requests: +$pipe->close_writing(); +$wxr = $pipe->consume_all(); -```php -// Save state after writing some entities -$cursor = $writer->get_reentrancy_cursor(); -$writer->close_writing(); +echo "bytes: " . strlen( $wxr ) . "\n"; +echo false !== strpos( $wxr, 'Hello' ) ? "title exported\n" : "title missing\n"; +echo false !== strpos( $wxr, 'publish' ) ? "status exported\n" : "status missing\n"; +``` -// Later, resume from where you left off -$writer = new WXRWriter( $output, $cursor ); -$writer->append_entity( $next_post ); + +``` +bytes: 475 +title exported +status exported ``` -### Writing SQL dumps +## Build a WXR programmatically from any source -`MySQLDumpWriter` produces SQL INSERT statements from entity data: +

    The writer doesn't care where entities come from. Loop over rows from a CMS, a CSV, or a Notion API dump and emit posts plus their meta and comments.

    + ```php + 10, 'title' => 'About', 'body' => '

    About us.

    ', 'tags' => array( 'company' ) ), + array( 'id' => 11, 'title' => 'Blog', 'body' => '

    Hello world.

    ', 'tags' => array( 'news', 'launch' ) ), +); -$writer->append_entity( new ImportEntity( 'database_row', array( - 'table' => 'wp_posts', - 'record' => array( - 'ID' => 1, - 'post_title' => 'First Post', - 'post_content' => 'Hello World', - ), -) ) ); +$pipe = new MemoryPipe(); +$writer = new WXRWriter( $pipe ); + +foreach ( $rows as $row ) { + $writer->append_entity( new ImportEntity( 'post', array( + 'post_id' => (string) $row['id'], + 'post_title' => $row['title'], + 'content' => $row['body'], + 'status' => 'publish', + 'post_type' => 'post', + ) ) ); + foreach ( $row['tags'] as $i => $tag ) { + $writer->append_entity( new ImportEntity( 'term', array( + 'term_id' => (string) ( $row['id'] * 100 + $i ), + 'taxonomy' => 'post_tag', + 'slug' => $tag, + 'parent' => '0', + ) ) ); + } +} +$writer->finalize(); $writer->close_writing(); -echo $output->consume_all(); -// INSERT INTO wp_posts (ID, post_title, post_content) VALUES (1, 'First Post', 'Hello World'); +$pipe->close_writing(); + +$wxr = $pipe->consume_all(); +echo "items: " . substr_count( $wxr, '' ) . "\n"; +echo "terms: " . substr_count( $wxr, '' ) . "\n"; +echo false !== strpos( $wxr, 'Blog' ) ? "Blog post exported\n" : "Blog post missing\n"; ``` -String values are automatically escaped. NULL values are written as SQL NULL. + +``` +items: 2 +terms: 3 +Blog post exported +``` -### Reading WXR files +## Read entities from a WXR file with constant memory -`WXREntityReader` streams through WXR files and emits entities as it encounters them. It never loads the full document into memory, so it can handle exports of any size: +

    WXREntityReader emits one entity at a time. A 10 GB WXR uses the same memory as a 10 KB one.

    + ```php + + + +Demo +First1postBody 1 +Second2postBody 2 + + +XML; + $reader = WXREntityReader::create(); -$reader->append_bytes( file_get_contents( 'export.xml' ) ); +$reader->append_bytes( $wxr ); $reader->input_finished(); while ( $reader->next_entity() ) { - $entity = $reader->get_entity(); - switch ( $entity->get_type() ) { - case 'site_option': - $data = $entity->get_data(); - // $data['option_name'], $data['option_value'] - break; - - case 'post': - $data = $entity->get_data(); - // $data['post_title'], $data['post_content'], etc. - break; - - case 'comment': - $data = $entity->get_data(); - // $data['comment_author'], $data['comment_content'], etc. - break; - } + $entity = $reader->get_entity(); + echo $entity->get_type() . ': ' . json_encode( $entity->get_data() ) . "\n"; } ``` -For streaming large files without reading them entirely into memory: - -```php -$reader = WXREntityReader::create(); -$handle = fopen( 'large-export.xml', 'r' ); - -while ( ! feof( $handle ) ) { - $reader->append_bytes( fread( $handle, 65536 ) ); - - while ( $reader->next_entity() ) { - $entity = $reader->get_entity(); - // Process entity... - } -} -fclose( $handle ); + +``` +site_option: {"option_name":"blogname","option_value":"Demo"} +post: {"post_title":"First","post_id":"1","post_type":"post","post_content":"Body 1"} +post: {"post_title":"Second","post_id":"2","post_type":"post","post_content":"Body 2"} ``` -### Processing block markup +## Streaming transform: rewrite URLs while copying WXR -`BlockMarkupProcessor` parses WordPress block comments (like ``) and lets you inspect and modify block names, attributes, and content: +

    Wire reader to writer to rewrite a WXR file on the fly. This pattern is how you migrate a staging export to production: swap staging.example.com for example.com without ever loading the file into memory.

    + ```php -use WordPress\DataLiberation\BlockMarkup\BlockMarkupProcessor; - -$markup = '' - . '' - . ''; - -$p = new BlockMarkupProcessor( $markup ); +next_token() ) { - if ( '#block-comment' === $p->get_token_type() ) { - echo $p->get_block_name(); // "wp:image" - $attrs = $p->get_block_attributes(); // ["url" => "/photo.jpg", "class" => "wide"] - echo $p->is_block_closer() ? 'closer' : 'opener'; - } -} -``` +use WordPress\ByteStream\MemoryPipe; +use WordPress\DataLiberation\EntityReader\WXREntityReader; +use WordPress\DataLiberation\EntityWriter\WXRWriter; +use WordPress\DataLiberation\ImportEntity; -Iterate over individual block attributes and modify them: +$source_xml = << + + +Hello1post +Visit https://staging.example.com/about for more. + + +XML; -```php -$p = new BlockMarkupProcessor( - '' -); -$p->next_token(); +$reader = WXREntityReader::create(); +$reader->append_bytes( $source_xml ); +$reader->input_finished(); -while ( $p->next_block_attribute() ) { - $key = $p->get_block_attribute_key(); // "class", then "url" - $value = $p->get_block_attribute_value(); // "wp-bold", then "old.png" +$out_pipe = new MemoryPipe(); +$writer = new WXRWriter( $out_pipe ); - if ( 'url' === $key ) { - $p->set_block_attribute_value( 'new.png' ); - } +while ( $reader->next_entity() ) { + $entity = $reader->get_entity(); + $data = $entity->get_data(); + foreach ( array( 'post_content', 'content', 'description' ) as $field ) { + if ( isset( $data[ $field ] ) ) { + $data[ $field ] = str_replace( 'staging.example.com', 'example.com', $data[ $field ] ); + } + } + if ( 'post' === $entity->get_type() ) { + $data['content'] = isset( $data['post_content'] ) ? $data['post_content'] : ( isset( $data['content'] ) ? $data['content'] : '' ); + } + $writer->append_entity( new ImportEntity( $entity->get_type(), $data ) ); } -echo $p->get_updated_html(); -// -``` - -### Rewriting URLs in block markup - -`BlockMarkupUrlProcessor` finds and rewrites URLs across all parts of block markup -- HTML attributes, block comment attributes, text nodes, and inline CSS: - -```php -use WordPress\DataLiberation\BlockMarkup\BlockMarkupUrlProcessor; - -$markup = 'About' - . ''; - -$p = new BlockMarkupUrlProcessor( $markup, 'https://old-site.com' ); - -while ( $p->next_url() ) { - $raw = $p->get_raw_url(); // "https://old-site.com/about", etc. - $parsed = $p->get_parsed_url(); // URL object with host, path, etc. +$writer->finalize(); +$writer->close_writing(); +$out_pipe->close_writing(); - // Rewrite to a new domain - $new_url = str_replace( 'old-site.com', 'new-site.com', $raw ); - $p->set_raw_url( $new_url ); -} +$wxr = $out_pipe->consume_all(); +echo false !== strpos( $wxr, 'https://example.com/about' ) ? "new URL present\n" : "new URL missing\n"; +echo false === strpos( $wxr, 'staging.example.com' ) ? "old URL removed\n" : "old URL still present\n"; +``` -echo $p->get_updated_html(); + +``` +new URL present +old URL removed ``` -### CSS tokenization +## Render Markdown into a WXR import in one pipeline -`CSSProcessor` tokenizes CSS according to the CSS Syntax Level 3 specification. It processes stylesheets one token at a time without building a full AST: +

    Compose MarkdownConsumer with WXRWriter to publish a folder of Markdown directly as a WordPress import file.

    + ```php -use WordPress\DataLiberation\CSS\CSSProcessor; - -$css = 'body { background: url("image.png"); color: red; }'; -$processor = CSSProcessor::create( $css ); +next_token() ) { - echo $processor->get_token_type() . ': ' . $processor->get_normalized_token() . "\n"; +use WordPress\ByteStream\MemoryPipe; +use WordPress\DataLiberation\EntityWriter\WXRWriter; +use WordPress\DataLiberation\ImportEntity; +use WordPress\Markdown\MarkdownConsumer; + +@mkdir( '/tmp/md-src', 0777, true ); +file_put_contents( '/tmp/md-src/hello.md', "---\ntitle: Hello\n---\n\n# Hello\n\nFirst post." ); +file_put_contents( '/tmp/md-src/second.md', "---\ntitle: Second\n---\n\nMore text **here**." ); + +$pipe = new MemoryPipe(); +$writer = new WXRWriter( $pipe ); + +$id = 1; +foreach ( glob( '/tmp/md-src/*.md' ) as $path ) { + $consumer = new MarkdownConsumer( file_get_contents( $path ) ); + $consumer->consume(); + $writer->append_entity( new ImportEntity( 'post', array( + 'post_id' => (string) $id++, + 'post_title' => $consumer->get_meta_value( 'title' ) ?: basename( $path, '.md' ), + 'content' => $consumer->get_block_markup(), + 'status' => 'publish', + 'post_type' => 'post', + 'post_name' => basename( $path, '.md' ), + ) ) ); } -``` - -## API Reference - -### Entity types (ImportEntity) - -| Type | Constants | Key data fields | -|------|-----------|----------------| -| `post` | `ImportEntity::TYPE_POST` | `post_title`, `post_content`, `post_date`, `guid`, `post_name`, `status`, `post_type`, `post_id` | -| `post_meta` | `ImportEntity::TYPE_POST_META` | `meta_key`, `meta_value` | -| `comment` | `ImportEntity::TYPE_COMMENT` | `comment_id`, `comment_author`, `comment_content`, `comment_date`, `comment_approved` | -| `term` | `ImportEntity::TYPE_TERM` | `term_id`, `taxonomy`, `slug`, `parent` | -| `site_option` | `ImportEntity::TYPE_SITE_OPTION` | `option_name`, `option_value` | -| `database_row` | -- | `table`, `record` (associative array of column => value) | - -### Writers (EntityWriter interface) - -| Class | Purpose | -|-------|---------| -| `WXRWriter` | Writes WXR XML exports. Constructor takes a `ByteWriteStream`. | -| `MySQLDumpWriter` | Writes SQL INSERT statements. Constructor takes a `ByteWriteStream`. | - -Shared methods: `append_entity( ImportEntity )`, `close_writing()`, `get_reentrancy_cursor()`. -### Readers (EntityReader interface) - -| Class | Purpose | -|-------|---------| -| `WXREntityReader` | Streams WXR XML files. Use `WXREntityReader::create()`. | -| `HTMLEntityReader` | Converts an HTML file into WordPress entities. | -| `EPubEntityReader` | Reads EPUB documents as WordPress entities. | -| `DatabaseRowsEntityReader` | Reads database query results as entities. | -| `FilesystemEntityReader` | Reads a directory tree as entities. | - -Shared methods: `next_entity()`, `get_entity()`, `is_finished()`, `get_reentrancy_cursor()`. - -### Block markup processors - -| Class | Purpose | -|-------|---------| -| `BlockMarkupProcessor` | Parses block comments. Key methods: `next_token()`, `get_block_name()`, `get_block_attributes()`, `is_self_closing_block()`, `is_block_closer()`, `next_block_attribute()`, `set_block_attribute_value()`. | -| `BlockMarkupUrlProcessor` | Finds and rewrites URLs in block markup. Key methods: `next_url()`, `get_raw_url()`, `get_parsed_url()`, `set_raw_url()`. | - -### CSS processors - -| Class | Purpose | -|-------|---------| -| `CSSProcessor` | CSS Syntax Level 3 tokenizer. Key methods: `next_token()`, `get_token_type()`, `get_normalized_token()`. | -| `CSSURLProcessor` | Finds and rewrites URLs inside CSS. | +$writer->finalize(); +$writer->close_writing(); +$pipe->close_writing(); -## Requirements +$wxr = $pipe->consume_all(); +echo "posts: " . substr_count( $wxr, '' ) . "\n"; +echo false !== strpos( $wxr, '<!-- wp:heading' ) ? "block markup exported\n" : "block markup missing\n"; +echo false !== strpos( $wxr, 'Second' ) ? "frontmatter title exported\n" : "frontmatter title missing\n"; +``` -- PHP 7.2+ -- No external dependencies + +``` +posts: 2 +block markup exported +frontmatter title exported +``` diff --git a/components/Encoding/README.md b/components/Encoding/README.md index c1a72b642..e963c6d83 100644 --- a/components/Encoding/README.md +++ b/components/Encoding/README.md @@ -1,143 +1,196 @@ -# Encoding +--- +slug: encoding +title: Encoding +install: wp-php-toolkit/encoding - -> 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/encoding.html](https://wordpress.github.io/php-toolkit/reference/encoding.html) -> Open the page to edit each snippet in your browser and run it in WordPress Playground. - +see_also: html | HTML | Normalize incoming text before HTML tokenization. +see_also: xml | XML | Keep invalid bytes out of XML streams. +see_also: dataliberation | DataLiberation | Clean content before importing it into WordPress. +--- -Pure PHP utilities for UTF-8 validation, scrubbing, and conversion. This component detects invalid byte sequences, replaces them with the Unicode Replacement Character using the maximal subpart algorithm, and provides low-level tools for working with Unicode code points -- all without requiring the `mbstring` extension. When `mbstring` is available, the library delegates to it for better performance. +UTF-8 validation and scrubbing with a pure-PHP fallback when mbstring is unavailable. Detects malformed bytes and replaces them per the Unicode maximal-subpart algorithm. -## Installation +## Why this exists -```bash -composer require wp-php-toolkit/encoding -``` - -## Quick Start - -```php -use function WordPress\Encoding\wp_is_valid_utf8; -use function WordPress\Encoding\wp_scrub_utf8; - -// Validate a string -wp_is_valid_utf8( 'Hello, world!' ); // true -wp_is_valid_utf8( "invalid \xC0 byte" ); // false +

    Every parser in this toolkit eventually has to decide what to do with text bytes. XML rejects malformed UTF-8. JSON and databases can fail late. CSS, HTML, WXR, and Blueprint validation all need consistent answers about whether a string is well-formed Unicode.

    -// Replace invalid bytes with the replacement character -echo wp_scrub_utf8( "caf\xC0 latte" ); // "caf\xEF\xBF\xBD latte" (caf? latte) -``` +

    The Encoding component provides the small UTF-8 primitives the rest of the toolkit can share: validate bytes, scrub invalid sequences, scan code points, and detect Unicode noncharacters. When mbstring is available it can delegate to it; when it is not, the component uses its own byte scanner so behavior stays available in restricted PHP environments.

    -## Usage +

    Historically, this became the common foundation for Blueprint validation and CSS/XML processing, replacing ad hoc Unicode helpers with the WordPress core UTF-8 routines used here.

    -### Validating UTF-8 +## Validating UTF-8 before storing it -`wp_is_valid_utf8()` checks whether a byte string is well-formed UTF-8. It rejects overlong sequences, surrogate halves, bytes that are never valid in UTF-8, and incomplete multi-byte sequences. +

    wp_is_valid_utf8() rejects overlong sequences, surrogate halves, and stray ISO-8859-1 bytes. Use it as a guard in front of any code path that assumes UTF-8 (database, JSON, XML).

    + ```php + 'just a test', + 'UTF-8 pencil' => "\xE2\x9C\x8F", + 'latin-1 byte' => "B\xFCch", + 'overlong slash' => "\xC1\xBF", + 'surrogate half' => "\xED\xB0\x80", +); + +foreach ( $samples as $label => $bytes ) { + echo sprintf( "%-14s %s\n", $label . ':', wp_is_valid_utf8( $bytes ) ? 'valid' : 'invalid' ); +} ``` -### Scrubbing Invalid Bytes + +``` +ASCII: valid +UTF-8 pencil: valid +latin-1 byte: invalid +overlong slash: invalid +surrogate half: invalid +``` -`wp_scrub_utf8()` replaces ill-formed byte sequences with the Unicode Replacement Character (U+FFFD). It follows the "maximal subpart" algorithm recommended by the Unicode Standard for secure and interoperable string handling. +## Scrubbing invalid bytes with U+FFFD -```php -use function WordPress\Encoding\wp_scrub_utf8; +

    Replace each ill-formed sequence with the Unicode replacement character. Useful right before serializing to XML, JSON, or sending to an LLM that will choke on broken bytes.

    -// Valid strings pass through unchanged -wp_scrub_utf8( 'test' ); // "test" + +```php + +``` +the byte � should not be here. +.��. ``` -### Detecting Noncharacters +## Detecting noncharacters MySQL/utf8mb4 will reject -`wp_has_noncharacters()` checks whether a string contains Unicode noncharacters -- code points that are permanently reserved and should not appear in open data interchange. +

    Code points like U+FFFE, U+FFFF, and the U+FDD0–U+FDEF block are valid Unicode but forbidden in XML and rejected by some databases. Check before inserting user-submitted content into a strict utf8mb4 column.

    + ```php + 'normal text', + 'U+FFFE' => "oops \u{FFFE}", + 'U+FDD0' => "hi \u{FDD0} bye", +); -// Normal text -wp_has_noncharacters( 'Hello' ); // false +foreach ( $samples as $label => $text ) { + echo sprintf( "%-12s %s\n", $label . ':', wp_has_noncharacters( $text ) ? 'reject' : 'ok' ); +} ``` -The noncharacter ranges are U+FDD0-U+FDEF, plus U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, and so on through U+10FFFE, U+10FFFF. + +``` +normal text: ok +U+FFFE: reject +U+FDD0: reject +``` -### Converting Code Points to UTF-8 +## Three-way pipeline: validate, scrub, then check noncharacters -`codepoint_to_utf8_bytes()` encodes a Unicode code point number into its UTF-8 byte representation. Invalid code points (surrogate halves, values above U+10FFFF) produce the replacement character. +

    Real-world inputs are messy: an old WXR export, a CSV with mixed encodings, a paste from Word. Combination of validate + scrub + noncharacter-check covers the three classes of breakage that bite later.

    + ```php -use function WordPress\Encoding\codepoint_to_utf8_bytes; + 'Café', + 'latin1' => "caf\xE9", + 'overlong' => "x\xC1\xBFy", + 'noncharac' => "hi \u{FFFE} there", +); + +foreach ( $inputs as $label => $bytes ) { + $valid = wp_is_valid_utf8( $bytes ); + $cleaned = wp_scrub_utf8( $bytes ); + $weird = wp_has_noncharacters( $cleaned ); + echo sprintf( "%-10s valid=%s noncharacter=%s -> %s\n", $label, $valid ? 'Y' : 'N', $weird ? 'Y' : 'N', $cleaned ); +} ``` -### Decoding UTF-8 to Code Points - -`utf8_ord()` converts a single UTF-8 character (byte sequence) back to its Unicode code point number. - -```php -use function WordPress\Encoding\utf8_ord; - -echo utf8_ord( 'A' ); // 65 (0x41) -echo utf8_ord( "\xE2\x9C\x8F" ); // 9999 (0x270F, Pencil) -echo utf8_ord( "\xF0\x9F\x85\xB0" ); // 127344 (0x1F170) + +``` +good valid=Y noncharacter=N -> Café +latin1 valid=N noncharacter=N -> caf� +overlong valid=N noncharacter=N -> x��y +noncharac valid=Y noncharacter=Y -> hi ￾ there ``` -### How the Fallback Works - -When `mbstring` is available, `wp_is_valid_utf8()` delegates to `mb_check_encoding()` and `wp_scrub_utf8()` delegates to `mb_scrub()`. Without `mbstring`, the library uses a pure-PHP byte scanner (`_wp_scan_utf8()`) that validates byte sequences against the UTF-8 well-formedness table from the Unicode Standard. This fallback is fully conformant and handles all edge cases, including the maximal subpart algorithm for scrubbing. - -The PCRE-based implementation of `wp_has_noncharacters()` is preferred when `PCRE/u` is available. Otherwise, a byte-level fallback scans the string directly. - -## API Reference - -### Functions +## Salvaging a legacy ISO-8859-1 column inside a UTF-8 corpus -| Function | Description | -|---|---| -| `wp_is_valid_utf8( $bytes )` | Returns `true` if the string is well-formed UTF-8 | -| `wp_scrub_utf8( $text )` | Replaces invalid byte sequences with U+FFFD | -| `wp_has_noncharacters( $text )` | Returns `true` if the string contains Unicode noncharacters | -| `codepoint_to_utf8_bytes( $codepoint )` | Encodes a code point number to its UTF-8 byte sequence | -| `utf8_ord( $character )` | Decodes a UTF-8 character to its code point number | +

    Old WordPress databases sometimes mix encodings: most rows are UTF-8 but a few were stored as latin-1. Detect the bad rows with wp_is_valid_utf8() and only re-encode those.

    -## Attribution + +```php + 'Plain ASCII', + 2 => 'Café', + 3 => "caf\xE9", + 4 => "weird \xC0 byte", +); + +foreach ( $rows as $id => $value ) { + if ( wp_is_valid_utf8( $value ) ) { + echo "#$id ok: $value\n"; + continue; + } + $converted = @iconv( 'ISO-8859-1', 'UTF-8', $value ); + if ( false !== $converted && wp_is_valid_utf8( $converted ) ) { + echo "#$id recovered as latin1: $converted\n"; + } else { + echo "#$id unrecoverable, scrubbing: " . wp_scrub_utf8( $value ) . "\n"; + } +} +``` -- PHP 7.2+ -- No external dependencies (`mbstring` is used when available but is not required) + +``` +#1 ok: Plain ASCII +#2 ok: Café +#3 recovered as latin1: café +#4 recovered as latin1: weird À byte +``` diff --git a/components/Filesystem/README.md b/components/Filesystem/README.md index b6442971a..492dfee29 100644 --- a/components/Filesystem/README.md +++ b/components/Filesystem/README.md @@ -1,146 +1,260 @@ -# Filesystem +--- +slug: filesystem +title: Filesystem +install: wp-php-toolkit/filesystem - -> 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/filesystem.html](https://wordpress.github.io/php-toolkit/reference/filesystem.html) -> Open the page to edit each snippet in your browser and run it in WordPress Playground. - +see_also: bytestream | ByteStream | Open files as readers and writers instead of loading full strings. +see_also: zip | Zip | Mount archives and copy data between archive-backed and normal filesystems. +see_also: git | Git | Expose repository trees through a filesystem-shaped API. +--- + +One Filesystem interface across local disk, in-memory trees, SQLite databases, and ZIP archives. Forward-slash paths everywhere — even on Windows — so the same code runs in tests, in production, and inside read-only ZIPs. ## Why this exists -PHP's built-in file functions (`file_get_contents`, `fopen`, `mkdir`, etc.) are tightly coupled to the local disk. That's fine for simple scripts, but it creates a real problem when you want to: +

    Code that touches the filesystem is hard to test, hard to port to Windows, and impossible to point at non-disk storage without rewriting it. Swap LocalFilesystem for InMemoryFilesystem in tests and your suite stops touching /tmp; swap it for SQLiteFilesystem and your "files" become rows in a portable database; swap it for ZipFilesystem and you can read inside an archive with the same calls.

    -- **Test code without touching the disk.** Unit tests that create real files are slow, fragile, and leave cleanup responsibilities behind. -- **Work with non-disk storage.** WordPress Playground runs entirely in the browser using a virtual filesystem backed by a SQLite database. Your code needs to work the same way against both a real disk and an in-memory tree. -- **Operate on ZIP archives as if they were directories.** Instead of extracting first and then reading, you want to walk a ZIP file the same way you'd walk a folder. -- **Stay portable across operating systems.** Windows uses backslashes; everything else uses forward slashes. Code that hardcodes separators breaks on the other platform. +

    Every backend uses forward slashes regardless of host OS. No DIRECTORY_SEPARATOR juggling, no Windows-only test failures, no surprises when a path moves between backends.

    -This component defines a single `Filesystem` interface and several implementations behind it. Write your code against the interface once, and it works against any backend. +## In-memory tree -## How it works +

    The fastest backend. No disk I/O, no cleanup, no test-isolation problems.

    -The `Filesystem` interface defines the operations every backend must support: listing directories, reading and writing files, checking existence, copying, renaming, deleting. Implementations handle the translation to whatever storage mechanism is underneath. + +```php +put_contents( '/hello.txt', 'Hello, world!' ); +echo $fs->get_contents( '/hello.txt' ); +``` -The `FilesystemVisitor` handles recursive tree traversal, emitting events for each directory and file it encounters. + +``` +Hello, world! +``` -### The implementations +## Test code without touching disk -**`LocalFilesystem`** — wraps PHP's built-in file functions. Works on the actual disk. +

    Code that takes a Filesystem parameter, instead of calling file_get_contents() directly, can be tested against an InMemoryFilesystem. The test sets up files in memory, exercises the function, and asserts on what got written — no temp directories, no cleanup.

    -**`InMemoryFilesystem`** — stores everything in a PHP array. Fast, zero I/O, perfect for tests and ephemeral scratch space. + +```php +get_contents( $path ), true ); + list( $maj, $min, $patch ) = explode( '.', $json['version'] ); + $json['version'] = $maj . '.' . $min . '.' . ( (int) $patch + 1 ); + $fs->put_contents( $path, json_encode( $json ) ); +} -**`UploadedFilesystem`** — wraps another filesystem and tracks which paths were written, for auditing what an operation produced. +$fs = InMemoryFilesystem::create(); +$fs->put_contents( '/package.json', '{"version":"1.2.3"}' ); +bump_version( $fs, '/package.json' ); -### ChrootLayer +echo $fs->get_contents( '/package.json' ) . "\n"; +``` -Many factory methods wrap a filesystem in a `ChrootLayer`, which jails all path operations to a specific root directory. This prevents code from accidentally escaping to `/` and makes it safe to hand a filesystem object to untrusted code. + +``` +{"version":"1.2.4"} +``` -## Usage +## Local disk with a chrooted root -### Read a file +

    LocalFilesystem::create($root) is implicitly chrooted: every path resolves relative to $root and a ../ cannot escape. Reach for it when a request path or CLI argument names a file inside one project directory.

    + ```php +is_file( '/wp-config.php' ) ) { - $contents = $fs->get_contents( '/wp-config.php' ); -} -``` +$fs->mkdir( '/uploads', array( 'recursive' => true ) ); +$fs->put_contents( '/uploads/note.txt', 'Hi from local disk.' ); -### Write a file +echo $fs->get_contents( '/uploads/../uploads/note.txt' ) . "\n"; -```php -$fs->put_contents( '/uploads/hello.txt', 'Hello, world.' ); +$fs->rmdir( '/', array( 'recursive' => true ) ); +echo "exists after cleanup? " . ( is_dir( $root ) ? 'yes' : 'no' ) . "\n"; +``` + + ``` +Hi from local disk. +exists after cleanup? no +``` + +## SQLite as a portable file store -### List a directory +

    The whole tree lives in one SQLite database file. Use it for self-contained scratch storage that survives process boundaries without leaving loose files behind.

    + ```php -foreach ( $fs->ls( '/wp-content/plugins' ) as $name ) { - echo $name . "\n"; // plugin directory names only, not full paths +mkdir( '/posts', array( 'recursive' => true ) ); +for ( $i = 1; $i <= 3; $i++ ) { + $fs->put_contents( "/posts/post-{$i}.md", "# Post {$i}\n\nBody {$i}." ); +} + +foreach ( $fs->ls( '/posts' ) as $name ) { + $first = strtok( $fs->get_contents( '/posts/' . $name ), "\n" ); + echo "{$name}: {$first}\n"; } ``` -### Use an in-memory filesystem for tests + +``` +post-1.md: # Post 1 +post-2.md: # Post 2 +post-3.md: # Post 3 +``` + +## Copy a tree across backends -Because your code accepts a `Filesystem` interface, you can swap in `InMemoryFilesystem` in tests without changing anything else: +

    The killer composability move: copy_between_filesystems() streams files chunk-by-chunk from any source to any target. Pull a ZIP into SQLite, snapshot SQLite to disk, mirror disk into RAM — all the same call.

    + ```php -use WordPress\Filesystem\InMemoryFilesystem; +put_contents( '/config.json', json_encode( [ 'debug' => true ] ) ); +use WordPress\Filesystem\InMemoryFilesystem; +use WordPress\Filesystem\LocalFilesystem; +use WordPress\Filesystem\SQLiteFilesystem; +use function WordPress\Filesystem\copy_between_filesystems; + +$root = sys_get_temp_dir() . '/copytree-' . uniqid(); +$local = LocalFilesystem::create( $root ); +$local->mkdir( '/site/posts', array( 'recursive' => true ) ); +$local->put_contents( '/site/posts/2024-01.md', '# Hello 2024' ); +$local->put_contents( '/site/index.html', '

    Home

    ' ); + +$sqlite = SQLiteFilesystem::create( ':memory:' ); +copy_between_filesystems( array( + 'source_filesystem' => $local, + 'source_path' => '/site', + 'target_filesystem' => $sqlite, + 'target_path' => '/snapshot', +) ); + +$mem = InMemoryFilesystem::create(); +copy_between_filesystems( array( + 'source_filesystem' => $sqlite, + 'source_path' => '/snapshot', + 'target_filesystem' => $mem, + 'target_path' => '/copy', +) ); + +echo "in memory after two copies:\n"; +echo " posts: " . implode( ', ', $mem->ls( '/copy/posts' ) ) . "\n"; +echo " index: " . $mem->get_contents( '/copy/index.html' ) . "\n"; + +$local->rmdir( '/', array( 'recursive' => true ) ); +``` -// Pass $fs to the code under test — it never touches the real disk. -$result = my_config_loader( $fs ); + +``` +in memory after two copies: + posts: 2024-01.md + index:

    Home

    ``` -### Walk a directory tree +## Atomic write via tempfile rename + +

    Write to a sibling tempfile, then rename — that's how you avoid leaving a half-written file on crash. rename() is atomic within a single filesystem.

    + ```php -use WordPress\Filesystem\Visitor\FilesystemVisitor; +next() ) { - $event = $visitor->get_event(); - echo $event->get_path() . ( $event->is_dir() ? '/' : '' ) . "\n"; +use WordPress\Filesystem\Filesystem; +use WordPress\Filesystem\LocalFilesystem; + +function atomic_put_contents( Filesystem $fs, $path, $bytes ) { + $tmp = $path . '.tmp.' . bin2hex( random_bytes( 4 ) ); + $fs->put_contents( $tmp, $bytes ); + $fs->rename( $tmp, $path ); } -``` -### Stream large files +$root = sys_get_temp_dir() . '/atomic-' . uniqid(); +$fs = LocalFilesystem::create( $root ); -For large files, streaming avoids loading everything into memory at once: +$fs->put_contents( '/config.json', '{"v":1}' ); +atomic_put_contents( $fs, '/config.json', '{"v":2}' ); -```php -$read_stream = $fs->open_read_stream( '/large-export.sql' ); -$write_stream = $fs->open_write_stream( '/large-export-copy.sql' ); +echo "config: " . $fs->get_contents( '/config.json' ) . "\n"; +echo "no .tmp leftovers: " . count( $fs->ls( '/' ) ) . " entries in root\n"; -while ( ! $read_stream->is_finished() ) { - $chunk = $read_stream->read( 65536 ); // 64 KB at a time - $write_stream->write( $chunk ); -} +$fs->rmdir( '/', array( 'recursive' => true ) ); +``` -$read_stream->close(); -$write_stream->close(); + +``` +config: {"v":2} +no .tmp leftovers: 1 entries in root ``` -### Copy files between different backends +## Path helpers that behave the same on Windows -Because every backend speaks the same interface, you can copy between them directly: +

    Unix path semantics apply on every host OS. This matters for abstract paths such as a SQLite key or a ZIP entry name because those paths do not live on a real drive.

    + ```php -use WordPress\Filesystem\LocalFilesystem; -use WordPress\Filesystem\InMemoryFilesystem; -use WordPress\Filesystem\Visitor\FilesystemVisitor; - -$local = new LocalFilesystem( '/var/www/html' ); -$memory = new InMemoryFilesystem(); - -// Copy everything from disk to memory. -$visitor = new FilesystemVisitor( $local, '/' ); -while ( $visitor->next() ) { - $event = $visitor->get_event(); - $path = $event->get_path(); - if ( $event->is_file() ) { - $memory->put_contents( $path, $local->get_contents( $path ) ); - } elseif ( $event->is_dir() ) { - $memory->mkdir( $path ); - } -} -``` + +``` +/var/www/site/index.php +/a/b +a/c/e +``` diff --git a/components/Git/README.md b/components/Git/README.md index f6128fb09..58076fb53 100644 --- a/components/Git/README.md +++ b/components/Git/README.md @@ -1,134 +1,273 @@ -# Git +--- +slug: git +title: Git +install: wp-php-toolkit/git - -> 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/git.html](https://wordpress.github.io/php-toolkit/reference/git.html) -> Open the page to edit each snippet in your browser and run it in WordPress Playground. - +see_also: filesystem | Filesystem | Work with repository trees through a storage abstraction. +see_also: merge | Merge | Resolve divergent histories with explicit three-way merge logic. +see_also: bytestream | ByteStream | Read and write object data without accidental buffering. +--- + +A pure-PHP Git client and server. Commits, branches, diffs, HTTP push/pull — all without shelling out to git. ## Why this exists -Git is typically used through the `git` binary — a compiled C program that reads and writes the repository on disk. That's perfect for most development workflows, but it breaks down in a few important scenarios: +

    Git is a useful storage model even when a server cannot run the git binary: snapshots, branches, object-addressed files, diffs, merges, and sync over HTTP. That matters for WordPress tools that want revision history for generated files, content snapshots, site state, or collaborative edits in constrained runtimes.

    + +

    The Git component implements the core repository operations in PHP and stores objects through the toolkit Filesystem interface. That means the same repository can live on disk, in memory, or in another backend, and higher-level code can commit files without knowing where objects are stored.

    -- **Serverless and sandboxed environments.** WordPress Playground runs PHP entirely in the browser via WebAssembly. There is no OS, no filesystem, no ability to exec a subprocess. Yet Playground needs to clone, commit, and push WordPress installations as Git repositories. -- **Programmatic repository manipulation.** Sometimes you want to create commits, rewrite history, or sync files between repositories entirely from PHP — without spawning a shell process or depending on the `git` binary being installed. -- **Embedding Git into a PHP application.** Build tools, deployment systems, and migration scripts that want to produce or consume Git repositories without a compile-time dependency on libgit2 or similar native libraries. +

    The docs start with simple commits because that mental model scales: a repository is just objects plus refs. From there, branches, history walking, root commits, and merges become details you can reason about instead of magic shell behavior.

    -This component implements the Git object model, pack protocol, and HTTP smart transport in pure PHP. It can talk to any standard Git remote — GitHub, GitLab, Gitea, self-hosted — using only PHP's HTTP client. +

    Choose it for tests, browser-like sandboxes, hosted WordPress environments, and applications that need Git behavior through PHP APIs instead of shell commands.

    -## How it works +## Commit files into an in-memory repo -Git's data model is simpler than it looks. Everything is content-addressed: the SHA-1 hash of an object's content is its name. There are four object types: +

    The simplest possible repository: an InMemoryFilesystem as object storage and one commit() call. Reach for this in tests, in WP-CLI snapshots, or any place you want versioning without touching disk.

    -- **blob** — file content, nothing else. -- **tree** — a directory listing: each entry maps a filename to either a blob hash (file) or another tree hash (subdirectory). -- **commit** — a snapshot: it points to a tree (the root of the working directory), zero or more parent commit hashes, and metadata like the author and message. -- **tag** — a named pointer to another object (usually a commit). + +```php +commit( array( + 'updates' => array( + 'README.md' => "# My Project\n", + 'src/hello-world.php' => 'get_branch_tip( 'HEAD' ) . "\n"; +echo "README: " . $repo->read_object_by_path( '/README.md' )->consume_all(); +``` -### Create a new repository and make a commit + +``` +commit: +HEAD: +README: # My Project +``` +## Walk the commit history + +

    Follow the parent chain from HEAD backwards. Building block for a WP-CLI "post revisions" log or a "what changed since release X" report.

    + + ```php -use WordPress\Git\GitRepository; -use WordPress\Filesystem\InMemoryFilesystem; +init(); +use WordPress\Filesystem\InMemoryFilesystem; +use WordPress\Git\GitRepository; +use WordPress\Git\Model\Commit; + +$repo = new GitRepository( InMemoryFilesystem::create() ); +foreach ( array( 'add intro', 'fix typo', 'expand examples' ) as $i => $msg ) { + $repo->commit( array( + 'updates' => array( 'post.md' => "# Draft {$i}" ), + 'commit' => array( 'message' => $msg ), + ) ); +} -// Stage a file by writing it to the working directory... -$fs->put_contents( '/hello.txt', 'Hello, world.' ); +$oid = $repo->get_branch_tip( 'HEAD' ); +while ( ! Commit::is_null_hash( $oid ) ) { + $c = $repo->read_object( $oid )->as_commit(); + echo substr( $c->hash, 0, 7 ) . ' ' . trim( $c->message ) . "\n"; + $oid = $c->get_first_parent_hash(); + if ( ! $oid || ! $repo->has_object( $oid ) ) break; +} +``` -// ...then commit. -$repo->stage_files( array( 'hello.txt' ) ); -$repo->commit( 'Initial commit', 'Author Name', 'author@example.com' ); + ``` + expand examples + fix typo + add intro +``` + +## Treat a repository like a filesystem -### Read a file from a specific commit +

    GitFilesystem wraps a repository in this toolkit's Filesystem interface. With the default options, each put_contents() records a new commit.

    + ```php +get_contents( '/hello.txt' ); -// "Hello, world." +$fs->put_contents( '/posts/hello.md', "# Hello\nFirst draft." ); +$fs->put_contents( '/posts/about.md', "# About\nWho we are." ); +$fs->put_contents( '/posts/hello.md', "# Hello\nSecond draft." ); + +echo "tree:\n"; +foreach ( $fs->ls( '/posts' ) as $name ) { + echo " /posts/{$name}\n"; +} +echo "\nhello.md now:\n" . $fs->get_contents( '/posts/hello.md' ) . "\n"; +``` + + ``` +tree: + /posts/about.md + /posts/hello.md + +hello.md now: +# Hello +Second draft. +``` + +## Branch, edit, and switch back -### Clone from a remote +

    Create a feature branch off the current commit, change files, flip HEAD back. Useful for experimental edits in collaborative tools.

    + ```php +init(); +$repo = new GitRepository( InMemoryFilesystem::create() ); +$base = $repo->commit( array( + 'updates' => array( 'config.json' => '{"flag":false}' ), + 'commit' => array( 'message' => 'baseline' ), +) ); -$repo->add_remote( 'origin', 'https://github.com/WordPress/wordpress-develop' ); -$remote = $repo->get_remote_client( 'origin' ); +$repo->create_branch( 'refs/heads/experiment', $base ); +$repo->checkout( 'refs/heads/experiment' ); +$repo->commit( array( + 'updates' => array( 'config.json' => '{"flag":true}' ), + 'commit' => array( 'message' => 'flip the flag' ), +) ); -// Fetch the default branch. -$remote->fetch( 'refs/heads/trunk' ); -``` +echo "on experiment: " . $repo->read_object_by_path( '/config.json' )->consume_all() . "\n"; -### Push to a remote +$repo->checkout( 'refs/heads/trunk' ); +echo "on trunk: " . $repo->read_object_by_path( '/config.json' )->consume_all() . "\n"; +``` -```php -$remote = $repo->get_remote_client( 'origin' ); -$remote->push( 'refs/heads/my-branch' ); + +``` +on experiment: {"flag":true} +on trunk: {"flag":false} ``` -### Read the commit log +## Three-way merge two branches +

    The classic Git workflow: branch off, edit on each side, merge. $repo->merge() finds the common ancestor, three-way-merges every file, and creates a merge commit.

    + + ```php -$head = $repo->get_head(); -$commit = $repo->read_commit( $head ); +message . "\n"; - echo ' by ' . $commit->author_name . ' <' . $commit->author_email . ">\n"; +use WordPress\Filesystem\InMemoryFilesystem; +use WordPress\Git\GitRepository; - $parent_hash = $commit->parent_hash; - $commit = $parent_hash ? $repo->read_commit( $parent_hash ) : null; -} -``` +$repo = new GitRepository( InMemoryFilesystem::create() ); +$base = $repo->commit( array( 'updates' => array( + 'todo.txt' => "buy milk\nwalk dog\nread book\n", +) ) ); -### Diff two commits +$repo->commit( array( 'updates' => array( + 'todo.txt' => "buy oat milk\nwalk dog\nread book\n", +) ) ); -```php -$changes = $repo->diff( $commit_hash_a, $commit_hash_b ); +$repo->create_branch( 'refs/heads/feature', $base ); +$repo->checkout( 'refs/heads/feature' ); +$repo->commit( array( 'updates' => array( + 'todo.txt' => "buy milk\nwalk dog\nread book\nwrite blog post\n", +) ) ); -foreach ( $changes as $path => $change ) { - echo $change['status'] . ' ' . $path . "\n"; - // 'A' = added, 'M' = modified, 'D' = deleted -} +$repo->checkout( 'refs/heads/trunk' ); +$result = $repo->merge( 'refs/heads/feature' ); + +echo "merge head: {$result['new_head']}\n"; +echo "conflicts: " . ( $result['conflicts'] ? implode( ',', $result['conflicts'] ) : 'none' ) . "\n"; +echo "result:\n" . $repo->read_object_by_path( '/todo.txt' )->consume_all(); ``` -### Use GitFilesystem anywhere a Filesystem is expected + +``` +merge head: +conflicts: none +result: +buy oat milk +walk dog +read book +write blog post +``` + +## Snapshot WordPress options into a repo -Because `GitFilesystem` implements the `Filesystem` interface, you can pass it to any code that operates on a filesystem — including `ZipEncoder` to package a commit as a ZIP file: +

    Serialize a chunk of WP state (options, post meta, a theme config) on every save and commit it. You get free history, diffs between snapshots, and a "rollback to last week" button.

    + ```php -use WordPress\Git\GitFilesystem; -use WordPress\Zip\ZipEncoder; +append_from_filesystem( $git_fs, '/' ); -$encoder->finish(); -``` +use WordPress\Filesystem\InMemoryFilesystem; +use WordPress\Git\GitRepository; + +$repo = new GitRepository( InMemoryFilesystem::create() ); -## Architecture notes +$snapshots = array( + array( 'blogname' => 'My Site', 'posts_per_page' => 10, 'timezone_string' => 'UTC' ), + array( 'blogname' => 'My Site', 'posts_per_page' => 20, 'timezone_string' => 'UTC' ), + array( 'blogname' => 'New Name', 'posts_per_page' => 20, 'timezone_string' => 'Europe/Warsaw' ), +); -Git object storage uses a two-level directory scheme: objects live in `.git/objects/ab/cdef...` where `ab` is the first two hex characters of the SHA-1 hash and `cdef...` is the rest. Pack files (compressed bundles of many objects) live in `.git/objects/pack/`. `GitRepository` handles both loose objects and pack file reading transparently. +foreach ( $snapshots as $i => $options ) { + $repo->commit( array( + 'updates' => array( 'options.json' => json_encode( $options, JSON_PRETTY_PRINT ) ), + 'commit' => array( 'message' => "snapshot #{$i}" ), + ) ); +} + +$head = $repo->get_branch_tip( 'HEAD' ); +$parent = $repo->read_object( $head )->as_commit()->get_first_parent_hash(); +$diff = $repo->diff_commits( $head, $parent ); -The HTTP smart protocol works in two round trips for a fetch: first a discovery request that returns the list of refs the remote knows about, then a pack-file negotiation that uploads a pack containing only the objects you don't already have. `GitRemote` implements this protocol using PHP's HTTP client, with no native dependencies. +echo "Files changed in last snapshot:\n"; +foreach ( $diff as $name => $entry ) { + echo " {$name}\n"; +} +``` + + +``` +Files changed in last snapshot: + options.json +``` diff --git a/components/HTML/README.md b/components/HTML/README.md index 3f4ebd90b..b2aa2c50f 100644 --- a/components/HTML/README.md +++ b/components/HTML/README.md @@ -1,147 +1,414 @@ -# HTML +--- +slug: html +title: HTML +install: wp-php-toolkit/html - -> 📚 **Runnable examples:** [https://wordpress.github.io/php-toolkit/reference/html.html](https://wordpress.github.io/php-toolkit/reference/html.html) -> Open the page to edit each snippet in your browser and run it in WordPress Playground. - +credit_title: Ported from WordPress core +credit_body: | + The HTML component is a port of WordPress core's WP_HTML_Tag_Processor and WP_HTML_Processor. Source: WordPress/wordpress-develop. Bug fixes flow in both directions. + +see_also: ../learn/01-rewriting-html.html | Tutorial — Rewriting HTML safely | The chapter that introduces the cursor model and the clean_post_html() function reused later in the importer. +see_also: blockparser | BlockParser | Parse block comments first, then rewrite the HTML inside each block. +see_also: markdown | Markdown | Convert Markdown to blocks before polishing generated HTML. +see_also: dataliberation | DataLiberation | Rewrite URLs and media references during import/export pipelines. +--- + +A pure-PHP HTML5 parser and tag rewriter mirroring WordPress core's HTML API. Treat HTML the way browsers do — without libxml2, DOMDocument, or regex hacks — and rewrite attributes in a single linear pass. ## Why this exists -Modifying HTML in PHP usually means one of two things: string manipulation (fragile, breaks on any attribute ordering or whitespace variation) or loading the DOM extension (which requires libxml2, triggers errors on valid HTML5 that doesn't conform to XML rules, and mangles the document in the process). +

    WordPress runs HTML fragments through filters every time a request renders: post content, block markup, comments, excerpts, widgets, feeds, imported documents. Those fragments can omit <html> and <body>, close tags implicitly, or mix browser-correct markup with author mistakes that DOMDocument and regular expressions do not model well.

    -WordPress needed a third option: a parser that can safely scan and modify real-world HTML — including malformed markup — without any native extension, without loading the whole document into memory, and without altering content it wasn't asked to change. The result is `WP_HTML_Tag_Processor` and `WP_HTML_Processor`, both mirrored here from WordPress core for use outside WordPress. +

    The HTML component gives WordPress-style code the same parsing model WordPress core uses: a browser-compatible tokenizer and tree-aware processor that run in pure PHP. Choose it for exact-byte rewrites, imperfect fragments, and post-content filters where a full DOM would do too much work.

    -The key design insight is that most HTML processing tasks don't need a full DOM tree. You want to find a tag and change one of its attributes. You want to add a class to every ``. You don't need to understand the document structure for that — you just need to scan forward efficiently. `WP_HTML_Tag_Processor` handles that case. When you do need structure — "find the `` inside a `
    ` inside a `