README.md

# Linux driver stack for Arm® Ethos™-U

## Glossary of terms, Abbreviations and Acronyms

* **NPU:** Refers to an Ethos-U Neural Processing Unit (NPU).
* **Subsystem:** A hardware configuration where the host CPU communicates with a subsystem consisting of an Arm Cortex®-M CPU and a NPU.
* **ML Island:** Another term for subsystem.
* **Direct drive:** A hardware configuration where the host CPU communicates directly with the NPU.

## Introduction

The Linux driver stack for Ethos-U NPU contains drivers, libraries and
applications to make the NPU available for use in Linux userspace.

## Hardware configurations

The Linux driver stack supports using the NPU in two hardware configurations
that are mutually exclusive.

### Subsystem configuration

In the subsystem configuration, a subsystem consisting of an Cortex-M CPU and a
NPU is used. In this configuration, the Cortex-M manages the NPU and the host
CPU communicates requests to the subsystem using a mailbox.

![Subsystem configuration](docs/ethos-u_subsystem_config.png "Subsystem hardware configuration")

### Direct drive configuration

In the direct drive configuration, the host CPU is responsible for managing NPU
directly.

**Note:** Only the Ethos-U65 & U85 NPUs are supported in the direct drive
configuration.

![Direct drive configuration](docs/ethos-u_direct_config.png "Direct drive hardware configuration")

## Project folder structure

To make a clear separation between the different components in the driver
stack, the source files are placed into separate directories according to their
purpose. Components are only allowed to use the header files from other
components include folder.

* **cmake:** Contains toolchain files for the CMake build system
* **delegate:** TensorFlow Lite delegate to offload the custom Ethos-U operator to the NPU (Only used in the direct drive configuration)
* **docs:** Documentation files
* **driver_library:** Library that provides a C++ and Python API for the NPU Linux kernel driver
* **kernel:** NPU Linux kernel driver
* **mailbox:** Linux kernel driver for the message handling unit used to pass messages between the host CPU and subsystem
* **remoteproc:** [Remote processor framework](https://www.kernel.org/doc/html/latest/staging/remoteproc.html) based Linux kernel driver used to setup and communicate with the subsystem
* **tests:** Tests for the driver stack
* **thirdparty:** Thirdparty components
* **tools:** Tools used in the driver stack
* **utils:** Utility applications for the driver stack

# Linux Kernel drivers

The NPU kernel driver provides a [Userspace API (UAPI)](kernel/uapi/ethosu.h)
that Linux userspace applications will use to dispatch inferences to the NPU.

How the inferences are executed on the NPU depends on the hardware configuration:

* In the subsystem configuration, the inference request is packaged as a
message and passed to the subsystem. The subsystem is then responsible for
executing the inference.

* In the direct drive configuration, the kernel driver will program the NPU directly
to execute the requested inference.

The kernel driver will detect what hardware configuration is used from the
device tree and configure the driver accordingly. For more information on how
this is done, please refer to [ethosu_driver.c](kernel/common/ethosu_driver.c).

## Driver folder structure

To make a clear distinction between the implementation of the two hardware
configurations in the kernel driver, the source and header files are separated
into configuration specific subfolders, and a common subfolder for everything
that is shared between them.

* **common:** Files used by both the configurations
* **direct:** Files for the direct drive configuration
* **rpmsg:** Files for the subsystem configuration
* **include:** Header files for the driver and UAPI

## Direct drive

In the direct drive configuration, the kernel driver is fully responsible for
managing the NPU, which means it has to configure the NPU's memory interfaces,
memory regions, handle NPU interrupts, handle the power management and manage
the queuing and execution of interfaces.

Unlike in the subsystem configuration where there is a firmware running on the
Cortex-M CPU that can be built with bundled networks. The direct drive
configuration does not have any firmware and therefore does not support bundled
networks.

The sequence diagram below shows an overview of how the direct drive
implementation handles an inference

![Direct drive sequence](docs/ethos-u_linux_direct_sequence.svg "Direct drive components and sequence")

### Device tree

The below device tree can be used as a reference for a NPU in the direct drive
configuration.

**Note:** the `ethosu_mem_config` and `ethosu_axi_config` nodes are mutually
exlusive and only one should appear in the device tree depending on the NPU
used.

```
/ {
    reserved-memory {
        #address-cells = <2>;
        #size-cells = <2>;
        ranges;

        // Memory region for the SRAM used by the NPU
        ethosu_sram: ethosu_sram@6c000000 {
            reg = <0 0x6c000000 0 0x200000>;
            no-map;
        };

        // Memory region for DMA buffer allocation
        ethosu_reserved: ethosu_reserved@84000000 {
            compatible = "shared-dma-pool";
            reg = <0 0x84000000 0 0x20000000>;
            no-map;
        };
    };

    // NPU driver
    ethosu@6d700000 {
        #address-cells = <2>;
        #size-cells = <2>;

        compatible = "arm,ethosu-direct";

        // Base address and size of NPU registers
        reg = <0 0x6d700000 0 0x2FFF>;

        memory-region = <&ethosu_reserved>;
        sram = <&ethosu_sram>;

        // Address mappings to translate between bus addresses (NPU) and physical host CPU addresses
        dma-ranges = <0 0x6c000000 0 0x6c000000 0 0x200000>,
                     <0 0x84000000 0 0x84000000 0 0x2000000>;

        interrupts = <0 168 4>;
        interrupt-names = "irq";

        // Memory region configuration
        region-cfgs = <3 3 0 3 3 3 3 3>;

        // Memory regions used for the command stream
        cs-region = <2>;

        // Memory interface configuration for Ethos-U85
        ethosu_mem_config {
            compatible = "arm,ethosu-mem-config";
            // <beats outstanding_read outstanding_write>
            sram = <0 64 32>;
            ext  = <1 64 32>;
            // <mem_domain mem_type axi_port>
            configs = <0 0 0>,
                      <0 0 0>,
                      <0 0 1>,
                      <0 0 1>;
        };

        // Memory interface configuration for Ethos-U65
        ethosu_axi_config {
            compatible = "arm,ethosu-axi-config";
            // AXI port0 <beats mem_type outstanding_read outstanding_write>
            // AXI port0 <beats mem_type outstanding_read outstanding_write>
            // AXI port1 <beats mem_type outstanding_read outstanding_write>
            // AXI port1 <beats mem_type outstanding_read outstanding_write>
            configs = <0 0 64 32>,
                      <0 0 64 32>,
                      <0 0 64 32>,
                      <0 0 64 32>;
        };
    };
};
```

## Subsystem

For the subsystem configuration, the NPU is fully managed by the firmware
running on the subsystem Cortex-M CPU. The NPU kernel driver is only
responsible for managing the requests from userspace and passing messages to
the subsystem, to perform the required actions.

The firmware running on the Cortex-M is typically implemented using a real-time operating system (RTOS) and
the [OpenAMP](https://www.openampproject.org/) framework. OpenAMP includes support for the
[Remote Processor Framework (remoteproc)](https://www.kernel.org/doc/html/latest/staging/remoteproc.html) and
[Remote Processor Messaging (rpmsg)](https://www.kernel.org/doc/html/latest/staging/rpmsg.html) which are
used to setup and communicate with the subsystem.

If one or more networks will be used frequently with the NPU, they can be
bundled with the firmware and referenced with an index in the UAPI, to avoid
having to pass the network data later.

The sequence diagram below shows an overview of how the subsystem
implementation handles an inference

![Subsystem sequence](docs/ethos-u_linux_sequence.svg "Subsystem (ML Island) components and sequence")

To be able to get the subsystem running and setting up the message passing a
set of kernel drivers are needed.

### Mailbox driver

The mailbox driver is responsible for managing the message handling unit (MHU) and
passing messages between the host CPU and Cortex-M CPU in the subsystem.
The driver creates an interface using the [common mailbox framework](https://docs.kernel.org/driver-api/mailbox.html)
from the Linux kernel to allow other drivers to pass messages using the MHU.

Please refer to files in the mailbox folder for more details.

### Remoteproc driver

The remoteproc driver is based on the remote processor framework and is
responsible for loading the firmware binary, allocating memory for the
firmware, configuring the firmware and booting the firmware on the Cortex-M
CPU. The remoteproc driver uses the mailbox client interface created by the
mailbox driver to communicate with the firmware.

Please refer to [ethosu_remoteproc.c](remoteproc/ethosu_remoteproc.c) for more
details.

Once the firmware has booted on the Cortex-M CPU, it will create a rpmsg based
communication channel between the NPU kernel driver and firmware. Rpmsg uses
shared buffers setup by the remoteproc driver to pass data between the NPU
kernel driver and firmware.

### NPU kernel driver

The NPU kernel driver handles all the userspace requests and passes messages to
the firmware to perform the actions needed to fulfill them. The messages are
passed to and from the firmware using the rpmsg communication channel that the
firmware has setup.

Please refer to [ethosu_rpmsg_mailbox.c](kernel/rpmsg/ethosu_rpmsg_mailbox.c) for more details about these
messages.

To get access to the rpmsg communication channel, the NPU kernel driver
registers that it needs the channel with the rpmsg driver in the Linux kernel
when it is loaded. When the channel becomes available, the Linux kernel
notifies the NPU kernel driver and the communicating between the driver and
firmware starts.

For more information on how this is done, please refer to [ethosu_driver.c](kernel/common/ethosu_driver.c).

### Device tree

The below device tree can be used as a reference for a NPU in the subsystem configuration.

```
/ {
    reserved-memory {
        #address-cells = <2>;
        #size-cells = <2>;
        ranges;

        // Memory region for the Cortex-M firmware
        ethosu_sram: ethosu_sram@6cf00000 {
            compatible = "shared-dma-pool";
            reg = <0 0x6cf00000 0 0x00100000>;
            no-map;
        };

        // Memory region for DMA buffer allocation
        ethosu_reserved: ethosu_reserved@84000000 {
            compatible = "shared-dma-pool";
            reg = <0 0x84000000 0 0x20000000>;
            no-map;
        };

        // Memory region used by remoteproc to allocate tensor area,
        // shared buffers etc for the Cortex-M firmware
        ethosu_ddr: ethosu_ddr@A4000000 {
            compatible = "shared-dma-pool";
            reg = <0 0xA4000000 0 0x1000000>;
            no-map;
        };
    };


    // Message Passing Unit (MHU). Please refer to the MHU driver for details
    mhuv1: mhu@6ca00000 {
        compatible = "arm,mhu_v1", "arm,primecell";
        reg = <0x0 0x6ca00000 0x0 0x1000>;
        interrupts = <0 168 4>;
        interrupt-names = "npu_rx";
        #mbox-cells = <1>;
        clocks = <&soc_refclk100mhz>;
        clock-names = "apb_pclk";
    };

    // Subsystem reset control. Please refer to the reset driver for details
    juno_ethosu_bridge: juno_ethosu_bridge@6f020000 {
        compatible = "arm,mali_fpga_sysctl";
        #reset-cells = <0x1>;
        reg = <0x0 0x6f020000 0x0 0x1000>;
    };


    // NPU driver
    ethosu {
        #address-cells = <2>;
        #size-cells = <2>;
        compatible = "simple-bus";
        ranges;

        // Address mappings to translate between bus addresses (Cortex-M) and physical host CPU addresses
        dma-ranges = <0 0x00000000 0 0x6cf00000 0 0x00100000>,
                     <0 0x60000000 0 0x80000000 0 0x25000000>;

        // Remoteproc driver
        ethosu-rproc {
            compatible = "arm,ethosu-rproc";

            // Memory regions for the firmware
            reg = <0 0x6cf00000 0 0x00100000>,
                  <0 0xA4000000 0 0x01000000>;
            reg-names = "rom", "shared";

            memory-region = <&ethosu_reserved>;

            // Mailbox IRQ communication
            mboxes = <&mhuv1 0>, <&mhuv1 0>;
            mbox-names = "tx", "rx";

            // Reset handler
            resets = <&juno_ethosu_bridge 0>;
        };
    };
};
```

# Driver library

The purpose of the driver library is to provide user friendly C++ and Python
APIs for dispatching inferences to the NPU kernel driver.

As the component diagram below illustrates, the network is separated from the
inference, allowing multiple inferences to share the same network. The buffer
class is used to store IFM and OFM data.

![Driver library](docs/driver_library_component.svg "Driver library component diagram")

# Ethos-U delegate

The delegate library consists of a TensorFlow Lite (TFLite) delegate that can
offload the custom Ethos-U operator in networks, to the NPU when used in a
direct drive configuration.

The delegate is implemented using the driver library APIs to communicate with
the NPU kernel driver. It is provided as an external delegate shared library
that can be loaded into TFLite based applications.

For more information about delegates and how they are implemented,
please refer to the [LiteRT Delegates](https://ai.google.dev/edge/litert/performance/delegates) documentation.

## Usage example

The Model Benchmark Tool provided by TFLite can be used to run a network with the delegate.

```bash
./benchmark_model --external_delegate_path=./libethosu_op_delegate.so --graph=network.tflite
```

For more information about the tool and how to build it, please refer to the [TFLite Model Benchmark Tool](https://github.com/tensorflow/tensorflow/blob/v2.17.0/tensorflow/lite/tools/benchmark/README.md)
documentation.

# Inference runner

The inference runner is a utility application provided with the driver stack
that can be used to dispatch inferences to the NPU in a subsystem
configuration. It is implemented using the driver library APIs to communicate
with the NPU kernel driver.

# Building

The driver stack comes with a CMake based build system. A toolchain file is
provided as a reference on how to cross compile for Aarch64 based systems. The
driver stack build system has been verified on the following:

* Ubuntu 22.04 LTS x86 64-bit Linux distribution
* Ubuntu 22.04 LTS Arm64 Linux distribution

Note that if your host system provides cross compilers and libraries of newer
versions than what is supported on your target system, you might be required to
download an older version of compilers and toolchains for your target system.
While out of scope for this README, an example
[toolchain file](cmake/toolchain/aarch64-linux-gnu-custom.cmake) is provided to
show what it could look like. Another option is to run a Docker image of an
appropriate Linux distribution suited to build for your needs.

Building the kernel modules requires a configured Linux kernel source tree and
a minimum Sparse version matching commit `0196afe16a50c76302921b139d412e82e5be2349`.
Please refer to the Linux kernel official documentation for instructions on how
to configure and build the Linux kernel and Sparse.

```
$ cmake -B build --toolchain $PWD/cmake/toolchain/aarch64-linux-gnu.cmake -DKDIR=<Kernel directory>
$ cmake --build build
```

## Compiler flags used

Refer to the appropriate toolchain file and the corresponding document for
a list of compiler flags used.

# Tested kernel versions

The Linux driver stack has been tested and validated with the following Linux kernel
versions:

* v5.16.20
* v5.19.14
* v6.1.134

# Licenses

The kernel drivers are provided under a GPL v2 license. All other software
components are provided under an Apache 2.0 license.

Please see [LICENSE-APACHE-2.0.txt](LICENSE-APACHE-2.0.txt) and
[LICENSE-GPL-2.0.txt](LICENSE-GPL-2.0.txt) for more information.

The [Userspace API (UAPI)](kernel/uapi/ethosu.h) has a
'WITH Linux-syscall-note' exception to the license. Please see
[Linux-syscall-note](Linux-syscall-note.txt) for more information.

# Contributions

The Arm Ethos-U project welcomes contributions under the Apache-2.0 license.

Before we can accept your contribution, you need to certify its origin and give
us your permission. For this process we use the Developer Certificate of Origin
(DCO) V1.1 (https://developercertificate.org).

To indicate that you agree to the terms of the DCO, you "sign off" your
contribution by adding a line with your name and e-mail address to every git
commit message. You must use your real name, no pseudonyms or anonymous
contributions are accepted. If there are more than one contributor, everyone
adds their name and e-mail to the commit message.

```
Author: John Doe \<john.doe@example.org\>
Date:   Mon Feb 29 12:12:12 2016 +0000

Title of the commit

Short description of the change.

Signed-off-by: John Doe john.doe@example.org
Signed-off-by: Foo Bar foo.bar@example.org
```

The contributions will be code reviewed by Arm before they can be accepted into
the repository.

In order to submit a contribution, submit a merge request to the
[linux_driver_stack](https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-linux-driver-stack)
repository. To do this you will need to sign-up at [gitlab.arm.com](https://gitlab.arm.com)
and add your SSH key under your settings.
In order to submit a contribution push your patch to

# Security

## Limit access to Ethos-U driver device

The Linux driver stack does not provide any access control to the character
device created by the Ethos-U driver. It is up to the user of the Ethos-U
driver to restrict what applications that shall have access to the device.

## Unrestricted NPU memory access

The NPU does not come with any hardware to restrict what memory locations it
can access. It is up to the user to provide and configure such hardware in the
system to restrict memory access.

## Report security related issues

Please see [Security](SECURITY.md).

# Trademark notice

Arm, Cortex and Ethos are registered trademarks of Arm Limited (or its
subsidiaries) in the US and/or elsewhere.

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.