# Linux driver stack for Arm® Ethos™-U ## Glossary of terms, Abbreviations and Acronyms * **NPU:** Refers to an Ethos-U Neural Processing Unit (NPU). * **Subsystem:** A hardware configuration where the host CPU communicates with a subsystem consisting of an Arm Cortex®-M CPU and a NPU. * **ML Island:** Another term for subsystem. * **Direct drive:** A hardware configuration where the host CPU communicates directly with the NPU. ## Introduction The Linux driver stack for Ethos-U NPU contains drivers, libraries and applications to make the NPU available for use in Linux userspace. ## Hardware configurations The Linux driver stack supports using the NPU in two hardware configurations that are mutually exclusive. ### Subsystem configuration In the subsystem configuration, a subsystem consisting of an Cortex-M CPU and a NPU is used. In this configuration, the Cortex-M manages the NPU and the host CPU communicates requests to the subsystem using a mailbox. ![Subsystem configuration](docs/ethos-u_subsystem_config.png "Subsystem hardware configuration") ### Direct drive configuration In the direct drive configuration, the host CPU is responsible for managing NPU directly. **Note:** Only the Ethos-U65 & U85 NPUs are supported in the direct drive configuration. ![Direct drive configuration](docs/ethos-u_direct_config.png "Direct drive hardware configuration") ## Project folder structure To make a clear separation between the different components in the driver stack, the source files are placed into separate directories according to their purpose. Components are only allowed to use the header files from other components include folder. * **cmake:** Contains toolchain files for the CMake build system * **delegate:** TensorFlow Lite delegate to offload the custom Ethos-U operator to the NPU (Only used in the direct drive configuration) * **docs:** Documentation files * **driver_library:** Library that provides a C++ and Python API for the NPU Linux kernel driver * **kernel:** NPU Linux kernel driver * **mailbox:** Linux kernel driver for the message handling unit used to pass messages between the host CPU and subsystem * **remoteproc:** [Remote processor framework](https://www.kernel.org/doc/html/latest/staging/remoteproc.html) based Linux kernel driver used to setup and communicate with the subsystem * **tests:** Tests for the driver stack * **thirdparty:** Thirdparty components * **tools:** Tools used in the driver stack * **utils:** Utility applications for the driver stack # Linux Kernel drivers The NPU kernel driver provides a [Userspace API (UAPI)](kernel/uapi/ethosu.h) that Linux userspace applications will use to dispatch inferences to the NPU. How the inferences are executed on the NPU depends on the hardware configuration: * In the subsystem configuration, the inference request is packaged as a message and passed to the subsystem. The subsystem is then responsible for executing the inference. * In the direct drive configuration, the kernel driver will program the NPU directly to execute the requested inference. The kernel driver will detect what hardware configuration is used from the device tree and configure the driver accordingly. For more information on how this is done, please refer to [ethosu_driver.c](kernel/common/ethosu_driver.c). ## Driver folder structure To make a clear distinction between the implementation of the two hardware configurations in the kernel driver, the source and header files are separated into configuration specific subfolders, and a common subfolder for everything that is shared between them. * **common:** Files used by both the configurations * **direct:** Files for the direct drive configuration * **rpmsg:** Files for the subsystem configuration * **include:** Header files for the driver and UAPI ## Direct drive In the direct drive configuration, the kernel driver is fully responsible for managing the NPU, which means it has to configure the NPU's memory interfaces, memory regions, handle NPU interrupts, handle the power management and manage the queuing and execution of interfaces. Unlike in the subsystem configuration where there is a firmware running on the Cortex-M CPU that can be built with bundled networks. The direct drive configuration does not have any firmware and therefore does not support bundled networks. The sequence diagram below shows an overview of how the direct drive implementation handles an inference ![Direct drive sequence](docs/ethos-u_linux_direct_sequence.svg "Direct drive components and sequence") ### Device tree The below device tree can be used as a reference for a NPU in the direct drive configuration. **Note:** the `ethosu_mem_config` and `ethosu_axi_config` nodes are mutually exlusive and only one should appear in the device tree depending on the NPU used. ``` / { reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; // Memory region for the SRAM used by the NPU ethosu_sram: ethosu_sram@6c000000 { reg = <0 0x6c000000 0 0x200000>; no-map; }; // Memory region for DMA buffer allocation ethosu_reserved: ethosu_reserved@84000000 { compatible = "shared-dma-pool"; reg = <0 0x84000000 0 0x20000000>; no-map; }; }; // NPU driver ethosu@6d700000 { #address-cells = <2>; #size-cells = <2>; compatible = "arm,ethosu-direct"; // Base address and size of NPU registers reg = <0 0x6d700000 0 0x2FFF>; memory-region = <ðosu_reserved>; sram = <ðosu_sram>; // Address mappings to translate between bus addresses (NPU) and physical host CPU addresses dma-ranges = <0 0x6c000000 0 0x6c000000 0 0x200000>, <0 0x84000000 0 0x84000000 0 0x2000000>; interrupts = <0 168 4>; interrupt-names = "irq"; // Memory region configuration region-cfgs = <3 3 0 3 3 3 3 3>; // Memory regions used for the command stream cs-region = <2>; // Memory interface configuration for Ethos-U85 ethosu_mem_config { compatible = "arm,ethosu-mem-config"; // sram = <0 64 32>; ext = <1 64 32>; // configs = <0 0 0>, <0 0 0>, <0 0 1>, <0 0 1>; }; // Memory interface configuration for Ethos-U65 ethosu_axi_config { compatible = "arm,ethosu-axi-config"; // AXI port0 // AXI port0 // AXI port1 // AXI port1 configs = <0 0 64 32>, <0 0 64 32>, <0 0 64 32>, <0 0 64 32>; }; }; }; ``` ## Subsystem For the subsystem configuration, the NPU is fully managed by the firmware running on the subsystem Cortex-M CPU. The NPU kernel driver is only responsible for managing the requests from userspace and passing messages to the subsystem, to perform the required actions. The firmware running on the Cortex-M is typically implemented using a real-time operating system (RTOS) and the [OpenAMP](https://www.openampproject.org/) framework. OpenAMP includes support for the [Remote Processor Framework (remoteproc)](https://www.kernel.org/doc/html/latest/staging/remoteproc.html) and [Remote Processor Messaging (rpmsg)](https://www.kernel.org/doc/html/latest/staging/rpmsg.html) which are used to setup and communicate with the subsystem. If one or more networks will be used frequently with the NPU, they can be bundled with the firmware and referenced with an index in the UAPI, to avoid having to pass the network data later. The sequence diagram below shows an overview of how the subsystem implementation handles an inference ![Subsystem sequence](docs/ethos-u_linux_sequence.svg "Subsystem (ML Island) components and sequence") To be able to get the subsystem running and setting up the message passing a set of kernel drivers are needed. ### Mailbox driver The mailbox driver is responsible for managing the message handling unit (MHU) and passing messages between the host CPU and Cortex-M CPU in the subsystem. The driver creates an interface using the [common mailbox framework](https://docs.kernel.org/driver-api/mailbox.html) from the Linux kernel to allow other drivers to pass messages using the MHU. Please refer to files in the mailbox folder for more details. ### Remoteproc driver The remoteproc driver is based on the remote processor framework and is responsible for loading the firmware binary, allocating memory for the firmware, configuring the firmware and booting the firmware on the Cortex-M CPU. The remoteproc driver uses the mailbox client interface created by the mailbox driver to communicate with the firmware. Please refer to [ethosu_remoteproc.c](remoteproc/ethosu_remoteproc.c) for more details. Once the firmware has booted on the Cortex-M CPU, it will create a rpmsg based communication channel between the NPU kernel driver and firmware. Rpmsg uses shared buffers setup by the remoteproc driver to pass data between the NPU kernel driver and firmware. ### NPU kernel driver The NPU kernel driver handles all the userspace requests and passes messages to the firmware to perform the actions needed to fulfill them. The messages are passed to and from the firmware using the rpmsg communication channel that the firmware has setup. Please refer to [ethosu_rpmsg_mailbox.c](kernel/rpmsg/ethosu_rpmsg_mailbox.c) for more details about these messages. To get access to the rpmsg communication channel, the NPU kernel driver registers that it needs the channel with the rpmsg driver in the Linux kernel when it is loaded. When the channel becomes available, the Linux kernel notifies the NPU kernel driver and the communicating between the driver and firmware starts. For more information on how this is done, please refer to [ethosu_driver.c](kernel/common/ethosu_driver.c). ### Device tree The below device tree can be used as a reference for a NPU in the subsystem configuration. ``` / { reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; // Memory region for the Cortex-M firmware ethosu_sram: ethosu_sram@6cf00000 { compatible = "shared-dma-pool"; reg = <0 0x6cf00000 0 0x00100000>; no-map; }; // Memory region for DMA buffer allocation ethosu_reserved: ethosu_reserved@84000000 { compatible = "shared-dma-pool"; reg = <0 0x84000000 0 0x20000000>; no-map; }; // Memory region used by remoteproc to allocate tensor area, // shared buffers etc for the Cortex-M firmware ethosu_ddr: ethosu_ddr@A4000000 { compatible = "shared-dma-pool"; reg = <0 0xA4000000 0 0x1000000>; no-map; }; }; // Message Passing Unit (MHU). Please refer to the MHU driver for details mhuv1: mhu@6ca00000 { compatible = "arm,mhu_v1", "arm,primecell"; reg = <0x0 0x6ca00000 0x0 0x1000>; interrupts = <0 168 4>; interrupt-names = "npu_rx"; #mbox-cells = <1>; clocks = <&soc_refclk100mhz>; clock-names = "apb_pclk"; }; // Subsystem reset control. Please refer to the reset driver for details juno_ethosu_bridge: juno_ethosu_bridge@6f020000 { compatible = "arm,mali_fpga_sysctl"; #reset-cells = <0x1>; reg = <0x0 0x6f020000 0x0 0x1000>; }; // NPU driver ethosu { #address-cells = <2>; #size-cells = <2>; compatible = "simple-bus"; ranges; // Address mappings to translate between bus addresses (Cortex-M) and physical host CPU addresses dma-ranges = <0 0x00000000 0 0x6cf00000 0 0x00100000>, <0 0x60000000 0 0x80000000 0 0x25000000>; // Remoteproc driver ethosu-rproc { compatible = "arm,ethosu-rproc"; // Memory regions for the firmware reg = <0 0x6cf00000 0 0x00100000>, <0 0xA4000000 0 0x01000000>; reg-names = "rom", "shared"; memory-region = <ðosu_reserved>; // Mailbox IRQ communication mboxes = <&mhuv1 0>, <&mhuv1 0>; mbox-names = "tx", "rx"; // Reset handler resets = <&juno_ethosu_bridge 0>; }; }; }; ``` # Driver library The purpose of the driver library is to provide user friendly C++ and Python APIs for dispatching inferences to the NPU kernel driver. As the component diagram below illustrates, the network is separated from the inference, allowing multiple inferences to share the same network. The buffer class is used to store IFM and OFM data. ![Driver library](docs/driver_library_component.svg "Driver library component diagram") # Ethos-U delegate The delegate library consists of a TensorFlow Lite (TFLite) delegate that can offload the custom Ethos-U operator in networks, to the NPU when used in a direct drive configuration. The delegate is implemented using the driver library APIs to communicate with the NPU kernel driver. It is provided as an external delegate shared library that can be loaded into TFLite based applications. For more information about delegates and how they are implemented, please refer to the [LiteRT Delegates](https://ai.google.dev/edge/litert/performance/delegates) documentation. ## Usage example The Model Benchmark Tool provided by TFLite can be used to run a network with the delegate. ```bash ./benchmark_model --external_delegate_path=./libethosu_op_delegate.so --graph=network.tflite ``` For more information about the tool and how to build it, please refer to the [TFLite Model Benchmark Tool](https://github.com/tensorflow/tensorflow/blob/v2.17.0/tensorflow/lite/tools/benchmark/README.md) documentation. # Inference runner The inference runner is a utility application provided with the driver stack that can be used to dispatch inferences to the NPU in a subsystem configuration. It is implemented using the driver library APIs to communicate with the NPU kernel driver. # Building The driver stack comes with a CMake based build system. A toolchain file is provided as a reference on how to cross compile for Aarch64 based systems. The driver stack build system has been verified on the following: * Ubuntu 22.04 LTS x86 64-bit Linux distribution * Ubuntu 22.04 LTS Arm64 Linux distribution Note that if your host system provides cross compilers and libraries of newer versions than what is supported on your target system, you might be required to download an older version of compilers and toolchains for your target system. While out of scope for this README, an example [toolchain file](cmake/toolchain/aarch64-linux-gnu-custom.cmake) is provided to show what it could look like. Another option is to run a Docker image of an appropriate Linux distribution suited to build for your needs. Building the kernel modules requires a configured Linux kernel source tree and a minimum Sparse version matching commit `0196afe16a50c76302921b139d412e82e5be2349`. Please refer to the Linux kernel official documentation for instructions on how to configure and build the Linux kernel and Sparse. ``` $ cmake -B build --toolchain $PWD/cmake/toolchain/aarch64-linux-gnu.cmake -DKDIR= $ cmake --build build ``` ## Compiler flags used Refer to the appropriate toolchain file and the corresponding document for a list of compiler flags used. # Tested kernel versions The Linux driver stack has been tested and validated with the following Linux kernel versions: * v5.16.20 * v5.19.14 * v6.1.134 # Licenses The kernel drivers are provided under a GPL v2 license. All other software components are provided under an Apache 2.0 license. Please see [LICENSE-APACHE-2.0.txt](LICENSE-APACHE-2.0.txt) and [LICENSE-GPL-2.0.txt](LICENSE-GPL-2.0.txt) for more information. The [Userspace API (UAPI)](kernel/uapi/ethosu.h) has a 'WITH Linux-syscall-note' exception to the license. Please see [Linux-syscall-note](Linux-syscall-note.txt) for more information. # Contributions The Arm Ethos-U project welcomes contributions under the Apache-2.0 license. Before we can accept your contribution, you need to certify its origin and give us your permission. For this process we use the Developer Certificate of Origin (DCO) V1.1 (https://developercertificate.org). To indicate that you agree to the terms of the DCO, you "sign off" your contribution by adding a line with your name and e-mail address to every git commit message. You must use your real name, no pseudonyms or anonymous contributions are accepted. If there are more than one contributor, everyone adds their name and e-mail to the commit message. ``` Author: John Doe \ Date: Mon Feb 29 12:12:12 2016 +0000 Title of the commit Short description of the change. Signed-off-by: John Doe john.doe@example.org Signed-off-by: Foo Bar foo.bar@example.org ``` The contributions will be code reviewed by Arm before they can be accepted into the repository. In order to submit a contribution, submit a merge request to the [linux_driver_stack](https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-linux-driver-stack) repository. To do this you will need to sign-up at [gitlab.arm.com](https://gitlab.arm.com) and add your SSH key under your settings. In order to submit a contribution push your patch to # Security ## Limit access to Ethos-U driver device The Linux driver stack does not provide any access control to the character device created by the Ethos-U driver. It is up to the user of the Ethos-U driver to restrict what applications that shall have access to the device. ## Unrestricted NPU memory access The NPU does not come with any hardware to restrict what memory locations it can access. It is up to the user to provide and configure such hardware in the system to restrict memory access. ## Report security related issues Please see [Security](SECURITY.md). # Trademark notice Arm, Cortex and Ethos are registered trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.