diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/_index.md b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/_index.md new file mode 100644 index 0000000000..fe53b052cf --- /dev/null +++ b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/_index.md @@ -0,0 +1,83 @@ +--- +title: Deploy a machine learning model to an NPU-capable system with Topo + +draft: true +cascade: + draft: true + +description: Use Topo to deploy a web application on Cortex-A that triggers a MobileNetV2 image classifier running as Cortex-M firmware with Ethos-U65 NPU acceleration. + +minutes_to_complete: 60 + +who_is_this_for: This is an introductory topic for embedded, edge, and cloud software developers who want to deploy machine learning workloads to heterogeneous Arm-based Linux targets using Topo. + +learning_objectives: + - Explain how Topo deploys an application that spans Cortex-A, Cortex-M, and Ethos-U + - Prepare an NXP FRDM i.MX 93 board for remoteproc-runtime and shared-memory inference + - Clone and deploy the topo-imx93-npu-deployment template + - Describe how the Template is bootstrapped from Compose services, Remoteproc Runtime metadata, and Topo arguments + - Run image classification from a browser and verify that inference is executed by the Cortex-M33 firmware + +prerequisites: + - A host machine (x86 or Arm) with Linux, macOS, or Windows + - An NXP FRDM i.MX 93 target board accessible over SSH with root access + - Docker installed on the host and target. For installation steps, see [Install Docker](/install-guides/docker/). + - lscpu installed on the target (pre-installed on most Linux distributions) + - Topo installed on the host. For installation steps, see [Deploy containerized workloads to Arm-based Linux targets with Topo](/learning-paths/cross-platform/deploy-containerized-workloads-with-topo/). + - Basic familiarity with containers, SSH, and CLI tools + +author: Tomas Agustin Gonzalez Orlando + +### Tags +skilllevels: Introductory +subjects: Containers and Virtualization +armips: + - Cortex-A + - Cortex-M + - Ethos-U +tools_software_languages: + - Topo + - Docker + - SSH + - ExecuTorch + - remoteproc-runtime +operatingsystems: + - Linux + - macOS + - Windows + +### Cross-platform metadata only +shared_path: true +shared_between: + - servers-and-cloud-computing + - laptops-and-desktops + - embedded-and-microcontrollers + +further_reading: + - resource: + title: Topo repository + link: https://github.com/arm/topo + type: documentation + - resource: + title: Topo template format + link: https://github.com/arm/topo-template-format + type: documentation + - resource: + title: Topo releases + link: https://github.com/arm/topo/releases/latest + type: website + - resource: + title: remoteproc-runtime + link: https://github.com/arm/remoteproc-runtime + type: documentation + - resource: + title: ExecuTorch + link: https://docs.pytorch.org/executorch/stable/index.html + type: documentation + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/_next-steps.md b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/_next-steps.md new file mode 100644 index 0000000000..727b395ddd --- /dev/null +++ b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 # The weight controls the order of the pages. _index.md always has weight 1. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +--- diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/build-the-template.md b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/build-the-template.md new file mode 100644 index 0000000000..b2c662e0ce --- /dev/null +++ b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/build-the-template.md @@ -0,0 +1,310 @@ +--- +title: Build the Topo Template from scratch +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Start from the application pieces + +The `topo-imx93-npu-deployment` repository is a Compose project with Topo metadata at the root. The Topo-specific part is not a replacement for Compose. The services still describe container builds, dependencies, ports, volumes, and runtime settings. The `x-topo` block adds the metadata Topo uses to identify the Template, check target requirements, and prompt for configuration. + +The project has three implementation areas: + +- `executorch-runner/`: builds the ExecuTorch `.pte` program and the Cortex-M33 firmware ELF. +- `webapp/`: builds the Flask application that stages memory and sends `RUN` commands over `RPMsg`. +- `compose.yaml`: connects the build artifacts, runtime services, Remoteproc Runtime settings, and Topo metadata. + +When bootstrapping this Template from scratch, first make the project work as a normal Compose build. Then add the `x-topo` metadata that lets Topo deploy it consistently to an Arm64 target. + +## Install the Topo Template authoring skills + +The [Topo Template Format](https://github.com/arm/Topo-Template-Format) repository includes public authoring skills for agents that support skill installation: + +- `topo-template-context`: provides Topo and Topo Template reference context for `x-topo` metadata, schema, docs, and CLI Template behavior. +- `topo-template-bootstrap`: converts a Compose repository into a Topo Template by adding or improving `compose.yaml` and `x-topo` metadata. +- `topo-template-lint`: reviews a Topo Template for schema correctness, metadata consistency, deployment success messages, and build argument wiring. + +Install the skills with `npx skills`: + +```bash +npx skills add arm/topo-template-format +``` + +If your agent does not use `npx skills`, clone the Template Format repository and manually copy or symlink the directories under `skills/` into your agent's skills directory: + +```bash +git clone https://github.com/arm/Topo-Template-Format.git +``` + +Restart your agent after installing or updating the skills. + +You can then use the skills as part of the Template authoring flow. From the root of any Compose project, ask your agent to use `topo-template-bootstrap`: + +```output +Use topo-template-bootstrap on this repository. +Treat the root compose.yaml as the Template root. +Preserve plain docker compose behavior. +Add x-topo metadata only where it reflects the actual services, hardware requirements, and build arguments. +``` + +After bootstrap, ask the agent to use `topo-template-lint`: + +```output +Use topo-template-lint on topo-imx93-npu-deployment. +Validate compose.yaml against the Topo Template Format schema. +Check README alignment, deployment_success_message, Remoteproc Runtime metadata, and x-topo.args wiring. +``` + +The lint pass should confirm that the Template has a root-level `x-topo.name`, that non-remoteproc services use `platform: linux/arm64`, that `cm33-runner` uses the Remoteproc Runtime annotation, and that every `x-topo.args` entry is carried into Compose or Docker build arguments where appropriate. + +## Create the runner build pipeline + +The `executorch-runner/Dockerfile` is a multi-stage Dockerfile. It builds two artifacts from one build context: + +- `mv2_ethosu65_256.pte`: the MobileNetV2 ExecuTorch program lowered for `ethos-u65-256`. +- `executorch_runner_cm33.elf`: the Cortex-M33 firmware image loaded by Linux `remoteproc`. + +The first half of the Dockerfile builds the model artifact: + +```Dockerfile +FROM build-base AS executorch-base +... +FROM executorch-base AS pte-builder +... +RUN source /workspace/executorch/examples/arm/arm-scratch/setup_path.sh && \ + python /usr/local/bin/export_mv2_imx93.py + +FROM busybox:1.36 AS pte-artifacts +COPY --from=pte-builder /workspace/build/mv2-imx93/mv2_ethosu65_256.pte /artifacts/mv2_ethosu65_256.pte +``` + +The second half builds and packages the firmware: + +```Dockerfile +FROM build-base AS runner-base +ARG MCUXSDK_MANIFEST_URL=https://github.com/nxp-mcuxpresso/mcuxsdk-manifests.git +ARG MCUXSDK_MANIFEST_REV=v25.09.00 +... +FROM runner-base AS runner-builder +RUN /usr/local/bin/build-runner.sh /artifacts + +FROM scratch AS runner-runtime +COPY --from=runner-builder /artifacts/executorch_runner_cm33.elf /executorch_runner_cm33.elf +ENTRYPOINT ["/executorch_runner_cm33.elf"] +``` + +The `runner-runtime` stage is intentionally a `scratch` image. The only payload is the ELF file. When the service starts with `runtime: io.containerd.remoteproc.v1`, containerd uses Remoteproc Runtime instead of a normal Linux process runtime. Remoteproc Runtime passes the ELF entrypoint to the Linux `remoteproc` driver, and the `imx-rproc` driver loads and releases the Cortex-M33. + +The project also applies patches before building the runner. One patch changes the MCUX SDK RAM linker and startup behavior so initialized data is loaded in-place by `remoteproc` rather than copied from a flash-style load address. The runner patches add RPMsg stability fixes and trace output used by the web application. + +## Add artifact-only Compose services + +At the root of the Template, create normal Compose services for the build outputs: + +```yaml +services: + pte-artifacts: + platform: linux/arm64 + scale: 0 + build: + context: executorch-runner + dockerfile: Dockerfile + target: pte-artifacts + + runner-artifacts: + platform: linux/arm64 + scale: 0 + build: + context: executorch-runner + dockerfile: Dockerfile + target: runner-artifacts +``` + +These services are not runtime application containers. `scale: 0` keeps them out of the running deployment while still making their build targets available to the rest of the Compose project. + +The web application imports the PTE artifact as a BuildKit additional context: + +```yaml +services: + webapp: + platform: linux/arm64 + build: + context: . + dockerfile: Dockerfile + additional_contexts: + pte_artifacts: service:pte-artifacts +``` + +The webapp Dockerfile then copies from that context: + +```Dockerfile +COPY --from=pte_artifacts /artifacts/mv2_ethosu65_256.pte /opt/mv2-imx93/mv2_ethosu65_256.pte +``` + +This keeps the model export pipeline separate from the Flask app while still producing one deployable webapp image. + +## Add the remote processor service + +The Cortex-M33 firmware is represented as another Compose service: + +```yaml +services: + cm33-runner: + platform: linux/arm64 + build: + context: executorch-runner + dockerfile: Dockerfile + target: runner-runtime + runtime: io.containerd.remoteproc.v1 + annotations: + remoteproc.name: imx-rproc +``` + +This is the key heterogeneous deployment hook. The service is still built by Docker, but it is not launched as a Linux userspace process. The `runtime` selects the containerd Remoteproc Runtime shim, and `remoteproc.name: imx-rproc` selects the i.MX 93 remote processor driver. + +After this service starts, Linux exposes the RPMsg device used by the Cortex-A web app. The Flask code waits for `/dev/ttyRPMSG*`, writes the `.pte` file to `0xC0000000`, writes the input tensor to `0xC036D000`, sends `RUN\n` over RPMsg, and parses the `CM33:` response lines into top-1 and top-5 ImageNet results. + +## Add the web application service + +The web application service extends `webapp/compose.yaml` from the root Compose file: + +```yaml +services: + webapp: + platform: linux/arm64 + extends: + file: webapp/compose.yaml + service: webapp + depends_on: + - cm33-runner +``` + +The extended service is privileged and mounts `/sys` and `/dev`: + +```yaml +services: + webapp: + privileged: true + ports: + - "${WEBAPP_PORT:-3001}:3000" + volumes: + - /sys:/sys + - /dev:/dev +``` + +Those mounts are required because the app checks `/proc/device-tree`, reads remoteproc state through `/sys/class/remoteproc`, talks to `/dev/ttyRPMSG*`, writes model and tensor data through `/dev/mem`, and checks for `/dev/ethosu0`. + +## Add Topo metadata + +After the Compose services are in place, add the root-level `x-topo` block: + +```yaml +x-topo: + name: "i.MX93 ExecuTorch runner" + description: "Runs a Cortex-A web application that sends image inference commands to a resident CM33 ExecuTorch runner over RPMsg." + features: + - "remoteproc-runtime" +``` + +Keep `x-topo` at the root of `compose.yaml`, not under `services`. The `features` entry is what tells Topo this Template needs a target with Remoteproc Runtime support. That is why `topo health` checks for: + +```output +Remoteproc Runtime: ✅ (remoteproc-runtime) +Remoteproc Shim: ✅ (containerd-shim-remoteproc-v1) +Subsystem Driver (remoteproc): ✅ (imx-rproc) +``` + +You can also add a deployment success message so users know exactly what to do after deployment: + +```yaml +x-topo: + deployment_success_message: | + The i.MX93 ExecuTorch runner is deployed. + Open http://:3001 and classify an ImageNet image. +``` + +## Expose project configuration + +Topo arguments are metadata for project parameters. Compose still carries the values into the build. + +The current Template exposes optional cache image parameters: + +```yaml +x-topo: + args: + EXECUTORCH_BASE_CACHE_IMAGE: + description: Optional GHCR image used as a BuildKit cache source for the ExecuTorch PTE build. + required: false + default: ghcr.io/arm-examples/topo-imx93-npu-deployment/executorch-base:et-v1.2.0-ubuntu24.04 + IMX93_RUNNER_BUILD_CACHE_IMAGE: + description: Optional GHCR image used as a BuildKit cache source for the CM33 runner build. + required: false + default: ghcr.io/arm-examples/topo-imx93-npu-deployment/imx93-runner-build:mcux-v25.09.00-armgcc14.2-ubuntu24.04 +``` + +Those values are used by Compose interpolation in `build.cache_from`: + +```yaml +cache_from: + - ${EXECUTORCH_BASE_CACHE_IMAGE:-ghcr.io/arm-examples/topo-imx93-npu-deployment/executorch-base:et-v1.2.0-ubuntu24.04} +``` + +For build-time configuration, wire Topo arguments into standard Compose `build.args`. The runner Dockerfile already declares project-specific arguments for the MCUX SDK manifest: + +```Dockerfile +ARG MCUXSDK_MANIFEST_URL=https://github.com/nxp-mcuxpresso/mcuxsdk-manifests.git +ARG MCUXSDK_MANIFEST_REV=v25.09.00 +``` + +To expose the SDK revision through Topo, add matching Compose build args to the services that build `runner-base` descendants: + +```yaml +services: + runner-artifacts: + build: + args: + MCUXSDK_MANIFEST_REV: ${MCUXSDK_MANIFEST_REV:-v25.09.00} + + cm33-runner: + build: + args: + MCUXSDK_MANIFEST_REV: ${MCUXSDK_MANIFEST_REV:-v25.09.00} + +x-topo: + args: + MCUXSDK_MANIFEST_REV: + description: MCUX SDK manifest revision used to build the Cortex-M33 runner. + required: false + default: v25.09.00 +``` + +With that wiring, Topo can prompt for the value when the Template is cloned or extended, Compose passes the value into Docker BuildKit, and the Dockerfile consumes it through `ARG MCUXSDK_MANIFEST_REV`. + +Use this only for configuration that should be chosen at Template setup time. Runtime-only settings, such as `WEBAPP_PORT`, should remain normal Compose environment interpolation unless you intentionally want Topo to collect them as build-time parameters. + +## Lint the Template + +Before publishing the Template, validate the root Compose file: + +```bash +check-jsonschema \ + --schemafile ../topo-template-format/schema/topo-template-format.json \ + compose.yaml +``` + +Then review the Template the same way Topo Template linting does: + +- The Template root contains `compose.yaml`. +- `compose.yaml` contains a root-level `x-topo.name`. +- Non-remoteproc services set `platform: linux/arm64`. +- The `cm33-runner` service uses `runtime: io.containerd.remoteproc.v1` and `remoteproc.name: imx-rproc`. +- `x-topo.description` matches the README and the actual Cortex-A to Cortex-M33 RPMsg flow. +- `x-topo.features` includes `remoteproc-runtime`. +- `x-topo.args` entries are either consumed through Compose interpolation, such as the cache image values, or wired into `services..build.args` and declared as Dockerfile `ARG` instructions. +- `deployment_success_message` tells the user to open the web app on the configured target port. + +## What you've accomplished + +You now understand how the `topo-imx93-npu-deployment` Template is built from ordinary Compose services plus Topo metadata: artifact-only build stages produce the model and firmware, Remoteproc Runtime starts the Cortex-M33 ELF, RPMsg connects the processors at runtime, and `x-topo.args` provides a path for setup-time configuration without replacing Docker or Compose. diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/deploy.md b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/deploy.md new file mode 100644 index 0000000000..f99db902ef --- /dev/null +++ b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/deploy.md @@ -0,0 +1,212 @@ +--- +title: Deploy the project +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Prepare the target + +Before deploying the Template, confirm that the FRDM i.MX 93 board is reachable from your host and that it's ready for deployment: + +```bash +topo health --target @ +``` +Replace `` with the IP address or hostname of your board. + +Resolve any errors before continuing. + +The target section should include successful checks similar to: + +```output +Host +---- +Topo: ✅ (topo) +SSH: ✅ (ssh) +Curl: ✅ (curl) +Container Engine: ✅ (docker) + +Target +------ +Destination: ssh:// +Connectivity: ✅ +Container Engine: ✅ (docker) +Remoteproc Runtime: ✅ (remoteproc-runtime) +Remoteproc Shim: ✅ (containerd-shim-remoteproc-v1) +Hardware Info: ✅ (lscpu) +Subsystem Driver (remoteproc): ✅ (imx-rproc) +``` + +If `remoteproc-runtime` is missing, install it with Topo: + +```bash +topo install remoteproc-runtime --target @ +``` + +Run the health check again: + +```bash +topo health --target @ +``` + +## Reserve memory in the device tree + +The web application and Cortex-M33 firmware exchange data through reserved physical memory. The target device tree must reserve memory for the model/input buffer and for Ethos-U65. You are now going to modify the device tree and reboot the target so that these modifications take effect. + +{{% notice Warning %}} +Back up the board's original device tree before modifying it. The exact boot partition can differ between Linux images, so check the paths on your board before copying files. +{{< /notice >}} + +On your host, create a working directory and dump the live device tree from the target: + +```bash +mkdir -p devicetree +ssh @ 'cat /sys/firmware/fdt' > devicetree/live.dtb +dtc -I dtb -O dts -o devicetree/live.dts devicetree/live.dtb +``` + +Open `devicetree/live.dts` in an editor. + +Under `remoteproc-cm33`, add the CM33 power domain if it is not already present: + +```dts +power-domains = <0x61>; +``` + +Under `reserved-memory`, add the model memory range: + +```dts +model@c0000000 { + reg = <0x00 0xc0000000 0x00 0x400000>; + no-map; +}; +``` + +Update the Ethos-U reserved-memory node so it is reserved and not reusable: + +```dts +ethosu_region@A8000000 { + compatible = "shared-dma-pool"; + reg = <0x00 0xa8000000 0x00 0x8000000>; + no-map; + phandle = <0x60>; +}; +``` + +Add `iomem=relaxed` to `chosen.bootargs`. For example: + +```dts +bootargs = "clk-imx93.mcore_booted console=ttyLP0,115200 earlycon root=/dev/mmcblk1p2 rootwait rw iomem=relaxed"; +``` + +Build the patched device tree: + +```bash +dtc -I dts -O dtb -o devicetree/patched.dtb devicetree/live.dts +``` + +Copy it to the board: + +```bash +scp devicetree/patched.dtb @:/tmp/patched.dtb +``` + +Install it on the board. Adjust the boot partition path if your image uses a different location: + +```bash +ssh @ +cp /run/media/boot-mmcblk1p1/imx93-11x11-frdm.dtb \ + /run/media/boot-mmcblk1p1/imx93-11x11-frdm.dtb.bak +cp /tmp/patched.dtb \ + /run/media/boot-mmcblk1p1/imx93-11x11-frdm.dtb +sync +reboot +``` + +After the board reboots, run the Topo health check again from the host: + +```bash +topo health --target @ +``` + +## Clone the Template + +Clone the Template onto your host: + +```bash +topo clone https://github.com/Arm-Examples/topo-imx93-npu-deployment.git +``` + +Topo prompts for optional build cache image arguments: + +```output +EXECUTORCH_BASE_CACHE_IMAGE +IMX93_RUNNER_BUILD_CACHE_IMAGE +``` + +Accept the defaults unless you have your own cache images. + +Enter the project directory: + +```bash +cd topo-imx93-npu-deployment +``` + +## Deploy to the board + +Deploy the project to your target: + +```bash +topo deploy --target @ +``` + +If not pulling from the cache, the first build can take a long time and requires about 25 GB of free disk space. It downloads and builds ExecuTorch, the Arm GNU toolchain, MCUX SDK components, RPMsg-Lite, and the Cortex-M33 runner sources. Later builds are faster when Docker can reuse local cache layers or import the configured GHCR cache layers. + +During deployment, Topo builds the required images, transfers them to the target, starts the Cortex-M33 firmware through `remoteproc-runtime`, and starts the web application. + +When deployment succeeds, the output includes a successful service startup. You can also check the deployed services: + +```bash +topo ps --target @ +``` + +## Open the web application + +Open the web application in a browser: + +```output +http://:3001 +``` + +The application shows: + +- an image selector +- a **Classify** button +- board prerequisite checks +- classification results +- an expandable analysis section with runtime details + +Select an image from an ImageNet-supported class, then click **Classify**. A successful run returns top-1 and top-5 ImageNet classifications. + +If you need to use a different target port, set `WEBAPP_PORT` when deploying: + +```bash +WEBAPP_PORT=3002 topo deploy --target @ +``` + +Then open: + +```output +http://:3002 +``` + +You should see something similar to: + + +![Screenshot of the web interface running on an Arm-based target, showing an image and the model response. This confirms successful deployment and provides a visual reference for the expected result.#center](topo_npu_classifier.png "Image classification as seen in the web app") + +## What you've accomplished + +You have prepared an FRDM i.MX 93 board for shared-memory NPU inference, deployed the `topo-imx93-npu-deployment` Template with Topo, started Cortex-M33 firmware through `remoteproc-runtime`, and used a browser-based application to run MobileNetV2 classification with Ethos-U65 acceleration. +Next, you will review how this project is structured as a Topo Template. diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/overview.md b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/overview.md new file mode 100644 index 0000000000..1baa3282ca --- /dev/null +++ b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/overview.md @@ -0,0 +1,74 @@ +--- +title: Deploy ExecuTorch firmware on NXP FRDM i.MX 93 for Ethos-U65 acceleration using Topo +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Get started + +Before getting started, complete the Learning Path [Deploy containerized workloads to Arm-based Linux targets with Topo](/learning-paths/cross-platform/deploy-containerized-workloads-with-topo/) to learn how to install Topo, run host and target health checks, inspect a target, list compatible Templates, and deploy a containerized workload. + +For more background on the underlying NPU example, read [Deploy ExecuTorch firmware on NXP FRDM i.MX 93 for Ethos-U65 acceleration](/learning-paths/embedded-and-microcontrollers/observing-ethos-u-on-nxp/). You do not need to complete that Learning Path before using this one, but it helps explain the model, firmware, and [Ethos-U65](https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u65) execution flow. + +## What is Topo? + +[Topo](https://github.com/arm/topo) is an open-source command-line tool developed by Arm used to deploy projects to an Arm-based Linux target over SSH. Topo builds container images on the host, transfers them to the target, and starts the services on the target. Topo can also build and deploy directly on the target. + +## What you'll learn + +In this Learning Path, you will deploy the [topo-imx93-npu-deployment](https://github.com/Arm-Examples/topo-imx93-npu-deployment) Topo Template to an NXP FRDM i.MX 93 board. + +The Template builds and deploys a browser-based MobileNetV2 image classifier. The user interface runs on the Cortex-A Linux side of the SoC. The inference runner is packaged as Cortex-M33 firmware and is started by [remoteproc-runtime](https://github.com/arm/remoteproc-runtime). The model is exported to an [ExecuTorch](https://docs.pytorch.org/executorch/stable/index.html) `.pte` [file](https://docs.pytorch.org/executorch/stable/pte-file-format.html) for Ethos-U65 NPU acceleration. + +### What does deploying the topo-imx93-npu-deployment Template do? + +Deploying the Template starts two runtime services on the target: + +- `webapp`: Web application running on the Cortex-A Linux host. It receives an image to run a classification on. +- `cm33-runner`: Cortex-M33 firmware, receives the image to classify from the web application and runs the classification Machine Learning model on it. + +When you select an image in the browser and click **Classify**, the web application: + +1. Resizes and normalizes the image to classify into an input tensor compatible with the [MobileNetV2](https://arxiv.org/abs/1801.04381) model. +2. Writes the ExecuTorch program and input tensor into reserved physical memory. +3. Sends a `RUN` command to the Cortex-M33 runner over `RPMsg`. +4. Waits for the Cortex-M33 firmware to run inference using Ethos-U65 acceleration. +5. Displays the top-1 and top-5 ImageNet classification results in the browser. + +## System Architecture + +The deployed application spans three processing domains on the i.MX 93: + +- **Cortex-A Linux host**: runs Docker, Topo-deployed containers, the Flask web app, and the Linux `remoteproc` and `RPMsg` interfaces. +- **Cortex-M33 firmware domain**: runs the ExecuTorch runner firmware loaded by `remoteproc-runtime`. +- **Ethos-U65 NPU**: accelerates delegated neural network operators from the ExecuTorch MobileNetV2 program. + +The high-level data flow is: + +```output +Browser + | + v +Flask web application on Cortex-A Linux + | + | writes .pte and input tensor to reserved memory + | sends RUN over RPMsg + v +Cortex-M33 ExecuTorch runner firmware + | + | delegates supported operators + v +Ethos-U65 NPU + | + v +Cortex-M33 returns classification results over RPMsg + | + v +Browser displays ImageNet top-1 and top-5 results +``` + +## What you've accomplished and what's next + +You now understand what the Topo Template deploys and how the Cortex-A, Cortex-M33, and Ethos-U65 parts work together. Next, you will prepare the i.MX 93 target and deploy the Template with Topo. diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/topo_npu_classifier.png b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/topo_npu_classifier.png new file mode 100644 index 0000000000..f744e12c64 Binary files /dev/null and b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/topo_npu_classifier.png differ diff --git a/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/what-are-the-toolchains.md b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/what-are-the-toolchains.md new file mode 100644 index 0000000000..0ffefc32f7 --- /dev/null +++ b/content/learning-paths/cross-platform/deploy-ml-model-to-npu-with-topo/what-are-the-toolchains.md @@ -0,0 +1,86 @@ +--- +title: Understand the toolchains +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Understand the build and runtime pieces + +The `topo-imx93-npu-deployment` Template combines several toolchains. Topo hides much of the deployment plumbing, but it is useful to understand what is being built and where each component runs. + +## ExecuTorch + +[ExecuTorch](https://docs.pytorch.org/executorch/stable/index.html) is the PyTorch Edge runtime for deploying PyTorch models to edge devices, using any acceleration hardware that is available on the target device. In this Template, ExecuTorch is used in two places: + +- At build time, the Template exports a MobileNetV2 model to an ExecuTorch `.pte` program. +- At run time, the Cortex-M33 firmware loads and executes that `.pte` program. + +The export pipeline uses the ExecuTorch Arm backend and targets `ethos-u65-256`. The model is quantized and lowered so supported neural network operators can be delegated to the Ethos-U65 NPU. The generated file is: + +```output +mv2_ethosu65_256.pte +``` + +The web application includes this `.pte` file in its container image. During inference, it writes the file into the reserved physical memory range starting at `0xC0000000`, where the Cortex-M33 runner can read it. + +## Cortex-M33 firmware runner + +The firmware runner is built as: + +```output +executorch_runner_cm33.elf +``` + +This firmware runs on the Cortex-M33 core. It waits for commands coming from the Linux web application over `RPMsg`, reads the input image tensors from reserved memory, executes inference through ExecuTorch, and writes classification output back over `RPMsg`. + +The Template packages the firmware as the entrypoint of the `cm33-runner` image: + +```yaml +cm33-runner: + runtime: io.containerd.remoteproc.v1 + annotations: + remoteproc.name: imx-rproc +``` + +The `runtime: io.containerd.remoteproc.v1` setting tells containerd to use the remote processor runtime instead of the normal Linux container runtime. The `remoteproc.name` annotation identifies the target remote processor driver, `imx-rproc`. + +## remoteproc-runtime + +Linux includes a `remoteproc` framework for loading and controlling auxiliary processors such as the Cortex-M33 on the i.MX 93. `remoteproc-runtime` adds an Open Container Initiative interface on top of this framework, allowing firmware to be packaged and launched using container tooling. + +Topo uses `remoteproc-runtime` when deploying the `cm33-runner` service. The deployment flow is: + +1. Topo builds the `runner-runtime` image containing `executorch_runner_cm33.elf`. +2. Topo starts the image on the target. +3. containerd uses `io.containerd.remoteproc.v1`. +4. `remoteproc-runtime` passes the ELF file to the Linux `remoteproc` driver. +5. The kernel loads the ELF segments and releases the Cortex-M33. + +This is why the target must pass the `Remoteproc Runtime`, `Remoteproc Shim`, and `Subsystem Driver (remoteproc)` checks in `topo health`. + +## RPMsg + +`RPMsg` is the communication channel between the Cortex-A Linux application and the Cortex-M33 firmware. The web application sends a `RUN` command over a `/dev/ttyRPMSG*` device. The firmware replies with status and classification output. + +If the deployment succeeds but classification times out, inspect the web app's board checks and the target's `RPMsg` devices. The application expects an `RPMsg` TTY to appear after the Cortex-M33 firmware starts. + +## Shared reserved memory + +The web application and firmware exchange model and input data through reserved physical memory. The Template expects the target device tree to reserve: + +- `model@c0000000`: 4 MiB for the ExecuTorch `.pte` file and input tensor. +- `ethosu_region@A8000000`: 128 MiB for Ethos-U65 use. + +The web application checks these ranges at startup through `/proc/device-tree`. It also checks for `/dev/mem`, `/dev/ethosu0`, the `imx-rproc` remote processor, the `.pte` file, and ImageNet labels. + +## Web application + +The `webapp` service is a Python Flask application. It serves the browser UI, preprocesses selected images, stages memory for the images sent to the Cortex-M33 runner, sends inference commands over `RPMsg`, and renders the ImageNet top-1 and top-5 results. + +By default, the service maps target port `3001` to container port `3000`. + +## What you've accomplished + +You now understand the major toolchains and runtime interfaces used by the Template: ExecuTorch, the Cortex-M33 firmware runner, remoteproc-runtime, RPMsg, reserved memory, and the Flask web application.