# Linux driver stack for Arm® Ethos™-U ## Glossary of terms, Abbreviations and Acronyms * **NPU:** Refers to an Ethos-U Neural Processing Unit (NPU). * **Subsystem:** A hardware configuration where the host CPU communicates with a subsystem consisting of an Arm Cortex®-M CPU and a NPU. * **ML Island:** Another term for subsystem. * **Direct drive:** A hardware configuration where the host CPU communicates directly with the NPU. * **TFLite:** TensorFlow Lite * **MHU:** Message handling unit used to pass messages between processors ## Introduction The Linux driver stack for Ethos-U NPU contains drivers, libraries and applications to make the NPU available for use in Linux userspace. ## Hardware configurations The Linux driver stack supports using the NPU in two hardware configurations that are mutually exclusive. ### Subsystem configuration In the subsystem configuration, a subsystem consisting of an Cortex-M CPU and a NPU is used. In this configuration, the Cortex-M manages the NPU and the host CPU communicates requests to the subsystem using a mailbox. ![Subsystem configuration](docs/ethos-u_subsystem_config.png "Subsystem hardware configuration") ### Direct drive configuration In the direct drive configuration, the host CPU is responsible for managing NPU directly. **Note:** Only the Ethos-U65 & U85 NPUs are supported in the direct drive configuration. ![Direct drive configuration](docs/ethos-u_direct_config.png "Direct drive hardware configuration") ## Project folder structure To make a clear separation between the different components in the driver stack, the source files are placed into separate directories according to their purpose. Components are only allowed to use the header files from other components include folder. * **cmake:** Contains toolchain files for the CMake build system * **delegate:** TFLite delegate to offload the custom Ethos-U operator to the NPU (Only used in the direct drive configuration) * **docs:** Documentation files * **driver_library:** Library that provides a C++ and Python API for the NPU Linux kernel driver * **kernel:** NPU Linux kernel driver * **mailbox:** Linux kernel driver for the message handling unit used to pass messages between the host CPU and subsystem * **remoteproc:** [Remote processor framework](https://www.kernel.org/doc/html/latest/staging/remoteproc.html) based Linux kernel driver used to setup and communicate with the subsystem * **tests:** Tests for the driver stack * **thirdparty:** Thirdparty components * **tools:** Tools used in the driver stack * **utils:** Utility applications for the driver stack # Linux kernel drivers ## Mailbox driver The mailbox driver is responsible for enabling message passing between the host CPU and Cortex-M in the subsystem. The driver is expected to be implemented as a [Mailbox controller driver](https://docs.kernel.org/driver-api/mailbox.html) to enable other drivers to use it to pass messages to the subsystem. The driver stack provides mailbox drivers for the Arm MHU version 1 and 2. Please refer to the files in the mailbox folder for more details. **Note:** This driver is only needed for the Subsystem configuration ## Reset driver The reset driver is responsible for resetting the subsystem. The driver is expected to be implemented as a [Reset controller driver](https://docs.kernel.org/driver-api/reset.html#reset-controller-driver-api) to enable other drivers to use it to reset the subsystem. The driver stack provides reset drivers for Juno and Corstone1000. Please refer to the files in the remoteproc folder for more details. **Note:** This driver is only needed for the Subsystem configuration ## Remoteproc driver The remoteproc driver is based on the [remote processor framework](https://www.kernel.org/doc/html/latest/staging/remoteproc.html) and is responsible for loading the firmware binary, allocating memory for the firmware, configuring the firmware and booting the firmware on the Cortex-M CPU. To communicate with and reset the subsystem, the remoteproc driver uses the mailbox and reset drivers. Please refer to [ethosu_remoteproc.c](remoteproc/ethosu_remoteproc.c) for more details. Once the firmware has booted on the Cortex-M CPU, it will create a rpmsg based communication channel between the NPU kernel driver and firmware. Rpmsg uses shared buffers setup by the remoteproc driver to pass data between the NPU kernel driver and firmware. **Note:** This driver is only needed for the Subsystem configuration ### Parameters The remoteproc driver supports two parameter when being loaded. * **filename:** Filename of the firmware binary in `/lib/firmware` to load for the Cortex-M CPU. * **auto_boot:** Indicates if the Cortex-M CPU should be automatically booted after loading the firmware. ## NPU kernel driver The NPU kernel driver provides a [Userspace API (UAPI)](kernel/uapi/ethosu.h) that Linux userspace applications will use to dispatch inferences to the NPU. How the NPU is managed and inferences are executed depends on the hardware configuration. The kernel driver will detect what hardware configuration is used from the device tree and configure the driver accordingly. For more information on how this is done, please refer to [ethosu_driver.c](kernel/common/ethosu_driver.c). ### Direct drive configuration In the direct drive configuration, the NPU kernel driver is fully responsible for managing the NPU, which means it has to configure the NPU's memory interfaces, memory regions, handle NPU interrupts, handle the power management and manage the queuing and execution of inferences requested by userspace. ### Subsystem configuration In the subsystem configuration, the NPU kernel driver handles all the userspace requests and converts them into messages sent to the firmware running on the Cortex-M in the subsystem. The firmware in turn parses the messages and performs the actions required to fulfill the requests e.g. running an inference. Messages are passed to and from the firmware using the rpmsg communication channel that is setup by the firmware when it boots up. To get access to the rpmsg communication channel, the NPU kernel driver registers that it needs the channel with the rpmsg driver in the Linux kernel when it is loaded. When the channel becomes available, the Linux kernel notifies the NPU kernel driver and the communicating between the driver and firmware starts. Please refer to [ethosu_rpmsg_mailbox.c](kernel/rpmsg/ethosu_rpmsg_mailbox.c) for more details about these messages. ### Folder structure To make a clear distinction between the implementation of the two hardware configurations in the kernel driver, the source and header files are separated into configuration specific subfolders, and a common subfolder for everything that is shared between them. * **common:** Files used by both the configurations * **direct:** Files for the direct drive configuration * **rpmsg:** Files for the subsystem configuration * **include:** Header files for the driver and UAPI ## Device tree This section contains references for how the device tree is expected to look for the direct drive and subsystem configurations. ### Direct drive configuration The below device tree can be used as a reference for a NPU in the direct drive configuration. **Note:** the `ethosu_mem_config` and `ethosu_axi_config` nodes are mutually exclusive and only one should appear in the device tree depending on the NPU used. ``` / { reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; // Memory region for the SRAM used by the NPU ethosu_sram: ethosu_sram@6c000000 { reg = <0 0x6c000000 0 0x200000>; no-map; }; // Memory region for DMA buffer allocation ethosu_reserved: ethosu_reserved@84000000 { compatible = "shared-dma-pool"; reg = <0 0x84000000 0 0x20000000>; no-map; }; }; // NPU driver ethosu@6d700000 { #address-cells = <2>; #size-cells = <2>; compatible = "arm,ethosu-direct"; // Base address and size of NPU registers reg = <0 0x6d700000 0 0x2FFF>; memory-region = <ðosu_reserved>; sram = <ðosu_sram>; // Address mappings to translate between bus addresses (NPU) and physical host CPU addresses dma-ranges = <0 0x6c000000 0 0x6c000000 0 0x200000>, <0 0x84000000 0 0x84000000 0 0x2000000>; interrupts = <0 168 4>; interrupt-names = "irq"; // Memory region configuration region-cfgs = <3 3 0 3 3 3 3 3>; // Memory regions used for the command stream cs-region = <2>; // Memory interface configuration for Ethos-U85 ethosu_mem_config { compatible = "arm,ethosu-mem-config"; // sram = <0 64 32>; ext = <1 64 32>; // configs = <0 0 0>, <0 0 0>, <0 0 1>, <0 0 1>; }; // Memory interface configuration for Ethos-U65 ethosu_axi_config { compatible = "arm,ethosu-axi-config"; // AXI port0 // AXI port0 // AXI port1 // AXI port1 configs = <0 0 64 32>, <0 0 64 32>, <0 0 64 32>, <0 0 64 32>; }; }; }; ``` ### Subsystem configuration The below device tree can be used as a reference for a NPU in the subsystem configuration. ``` / { reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; // Memory region for the Cortex-M firmware ethosu_sram: ethosu_sram@6cf00000 { compatible = "shared-dma-pool"; reg = <0 0x6cf00000 0 0x00100000>; no-map; }; // Memory region for DMA buffer allocation ethosu_reserved: ethosu_reserved@84000000 { compatible = "shared-dma-pool"; reg = <0 0x84000000 0 0x20000000>; no-map; }; // Memory region used by remoteproc to allocate tensor area, // shared buffers etc for the Cortex-M firmware ethosu_ddr: ethosu_ddr@A4000000 { compatible = "shared-dma-pool"; reg = <0 0xA4000000 0 0x1000000>; no-map; }; }; // Message Passing Unit (MHU). Please refer to the MHU driver for details mhuv1: mhu@6ca00000 { compatible = "arm,mhu_v1", "arm,primecell"; reg = <0x0 0x6ca00000 0x0 0x1000>; interrupts = <0 168 4>; interrupt-names = "npu_rx"; #mbox-cells = <1>; clocks = <&soc_refclk100mhz>; clock-names = "apb_pclk"; }; // Subsystem reset control. Please refer to the reset driver for details juno_ethosu_bridge: juno_ethosu_bridge@6f020000 { compatible = "arm,mali_fpga_sysctl"; #reset-cells = <0x1>; reg = <0x0 0x6f020000 0x0 0x1000>; }; // NPU driver ethosu { #address-cells = <2>; #size-cells = <2>; compatible = "simple-bus"; ranges; // Address mappings to translate between bus addresses (Cortex-M) and physical host CPU addresses dma-ranges = <0 0x00000000 0 0x6cf00000 0 0x00100000>, <0 0x60000000 0 0x80000000 0 0x25000000>; // Remoteproc driver ethosu-rproc { compatible = "arm,ethosu-rproc"; // Memory regions for the firmware reg = <0 0x6cf00000 0 0x00100000>, <0 0xA4000000 0 0x01000000>; reg-names = "rom", "shared"; memory-region = <ðosu_reserved>; // Mailbox IRQ communication mboxes = <&mhuv1 0>, <&mhuv1 0>; mbox-names = "tx", "rx"; // Reset handler resets = <&juno_ethosu_bridge 0>; }; }; }; ``` ## Inference flow How an inference is managed and executed depends on the hardware configuration used. ### Direct drive For the direct drive configuration, the NPU is directly managed by the NPU kernel driver. Unlike in the subsystem configuration where there is a firmware running on the Cortex-M CPU that can be built with bundled networks. The direct drive configuration does not have any firmware and therefore does not support bundled networks. The sequence diagram below shows an overview of how the direct drive implementation handles an inference ![Direct drive sequence](docs/ethos-u_linux_direct_sequence.svg "Direct drive components and sequence") ### Subsystem For the subsystem configuration, the NPU is fully managed by the firmware running on the subsystem Cortex-M CPU. The NPU kernel driver is only responsible for managing the requests from userspace and passing messages to the subsystem, to perform the required actions. The firmware running on the Cortex-M is typically implemented using a real-time operating system (RTOS) and the [OpenAMP](https://www.openampproject.org/) framework. OpenAMP includes support for the [Remote Processor Framework (remoteproc)](https://www.kernel.org/doc/html/latest/staging/remoteproc.html) and [Remote Processor Messaging (rpmsg)](https://www.kernel.org/doc/html/latest/staging/rpmsg.html) which are used to setup and communicate with the subsystem. If one or more networks will be used frequently with the NPU, they can be bundled with the firmware and referenced with an index in the UAPI, to avoid having to pass the network data later. The sequence diagram below shows an overview of how the subsystem implementation handles an inference ![Subsystem sequence](docs/ethos-u_linux_sequence.svg "Subsystem (ML Island) components and sequence") # Driver library The purpose of the driver library is to provide user friendly C++ and Python APIs for dispatching inferences to the NPU kernel driver. As the component diagram below illustrates, the network is separated from the inference, allowing multiple inferences to share the same network. The buffer class is used to store IFM and OFM data. ![Driver library](docs/driver_library_component.svg "Driver library component diagram") # Ethos-U delegate for direct drive The delegate library consists of a TFLite delegate that can offload the custom Ethos-U operator in networks, to the NPU when used in a direct drive configuration. The delegate is implemented using the driver library APIs to communicate with the NPU kernel driver. It is provided as an external delegate shared library that can be loaded into TFLite based applications. For more information about delegates and how they are implemented, please refer to the [LiteRT Delegates](https://ai.google.dev/edge/litert/performance/delegates) documentation. # Running inferences How to run inferences and how the networks need to be compiled, depends on the hardware configuration used For information about how to compile networks, please refer to the [Vela](https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela) documentation. ## IFM and OFM data The utility applications provided by the driver stack, expect IFM data to be provided as a single concatenated binary file where the inputs are in the order required by the network. If the IFM data is in NumPy format, it can be converted to a binary file using the example below. ```sh python3 -c 'import numpy;i=numpy.load("input.npy");i.tofile("input.bin");' ``` If the IFM data is in separate binary files, the binary files can be concatenated using the example below. ```sh cat input0.bin input1.bin > input.bin ``` The utility applications will provide the inference result as a single OFM file that contains concatenated output binary data in the order specified by the network. ## Direct drive configuration In the direct drive configuration, only the NPU kernel driver module (`ethosu.ko`) needs to be loaded into the Linux kernel to make the NPU available for use. To run inferences, the driver stack provides the `delegate_runner` utility application. It is implemented using the TFLite library and supports using the Ethos-U delegate to offload supported operators to the NPU. #### Usage example ```sh ./delegate_runner -l libethosu_op_delegate.so -n -i -o ``` For more information about the delegate runner, please see the help message in the application. ## Subsystem configuration In the subsystem configuration, multiple kernel modules needs to be loaded into the Linux kernel to make the NPU available for use: * Mailbox driver (`arm_mhu.ko` or equivalent for target system) * Reset driver (`juno_fpga_reset.ko` or equivalent for target system) * Remoteproc driver (`ethosu_remoteproc.ko`) * NPU kernel driver (`ethosu.ko`) **Note:** The driver modules should be loaded in the order specified above To run inferences, the driver stack provides the `inference_runner` utility application. It is implemented using the driver library APIs to communicate with the NPU kernel driver. #### Usage example ```sh ./inference_runner -n -i -o ``` For more information about the inference runner, please see the help message in the application. # Building The driver stack comes with a CMake based build system. A toolchain file is provided as a reference on how to cross compile for Aarch64 based systems. The driver stack build system has been verified on the following: * Ubuntu 22.04 LTS x86 64-bit Linux distribution * Ubuntu 22.04 LTS Arm64 Linux distribution Note that if your host system provides cross compilers and libraries of newer versions than what is supported on your target system, you might be required to download an older version of compilers and toolchains for your target system. While out of scope for this README, an example [toolchain file](cmake/toolchain/aarch64-linux-gnu-custom.cmake) is provided to show what it could look like. Another option is to run a Docker image of an appropriate Linux distribution suited to build for your needs. Building the kernel modules requires a configured Linux kernel source tree and a minimum Sparse version matching commit `0196afe16a50c76302921b139d412e82e5be2349`. Please refer to the Linux kernel official documentation for instructions on how to configure and build the Linux kernel and Sparse. ```sh $ cmake -B build --toolchain $PWD/cmake/toolchain/aarch64-linux-gnu.cmake -DKDIR= $ cmake --build build ``` ## Compiler flags used Refer to the appropriate toolchain file and the corresponding document for a list of compiler flags used. # Kernel versions The Linux driver stack has been tested and validated with the following Linux kernel versions: * v5.16.20 * v5.19.14 * v6.1.134 Kernel versions below 5.16 are currently not supported. # Licenses The kernel drivers are provided under a GPL v2 license. All other software components are provided under an Apache 2.0 license. Please see [LICENSE-APACHE-2.0.txt](LICENSE-APACHE-2.0.txt) and [LICENSE-GPL-2.0.txt](LICENSE-GPL-2.0.txt) for more information. The [Userspace API (UAPI)](kernel/uapi/ethosu.h) has a 'WITH Linux-syscall-note' exception to the license. Please see [Linux-syscall-note](Linux-syscall-note.txt) for more information. # Contributions The Arm Ethos-U project welcomes contributions under the Apache-2.0 license. Before we can accept your contribution, you need to certify its origin and give us your permission. For this process we use the Developer Certificate of Origin (DCO) V1.1 (https://developercertificate.org). To indicate that you agree to the terms of the DCO, you "sign off" your contribution by adding a line with your name and e-mail address to every git commit message. You must use your real name, no pseudonyms or anonymous contributions are accepted. If there are more than one contributor, everyone adds their name and e-mail to the commit message. ``` Author: John Doe Date: Mon Feb 29 12:12:12 2016 +0000 Title of the commit Short description of the change. Signed-off-by: John Doe john.doe@example.org Signed-off-by: Foo Bar foo.bar@example.org ``` The contributions will be code reviewed by Arm before they can be accepted into the repository. In order to submit a contribution, submit a merge request to the [linux_driver_stack](https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-linux-driver-stack) repository. To do this you will need to sign-up at [gitlab.arm.com](https://gitlab.arm.com) and add your SSH key under your settings. In order to submit a contribution push your patch to # Known limitations ## Q-Channel support The NPU kernel driver Q-channel support is limited, the driver will disable both clock and power gating during NPU reset. This means that the clock and power for the NPU will always be on. An exception to this is when the driver prepares for system or runtime suspend, at which the clock and power gating will be enabled so the NPU can be powered off. # Security ## Limit access to Ethos-U driver device The Linux driver stack does not provide any access control to the character device created by the Ethos-U driver. It is up to the user of the Ethos-U driver to restrict what applications that shall have access to the device. ## Unrestricted NPU memory access The NPU does not come with any hardware to restrict what memory locations it can access. It is up to the user to provide and configure such hardware in the system to restrict memory access. ## Report security related issues Please see [Security](SECURITY.md). # Trademark notice Arm, Cortex and Ethos are registered trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.