Skip to content
README.md 21.2 KiB
Newer Older
# Linux driver stack for Arm® Ethos™-U
Kristofer Jonsson's avatar
Kristofer Jonsson committed

## Glossary of terms, Abbreviations and Acronyms
Kristofer Jonsson's avatar
Kristofer Jonsson committed

* **NPU:** Refers to an Ethos-U Neural Processing Unit (NPU).
* **Subsystem:** A hardware configuration where the host CPU communicates with a subsystem consisting of an Arm Cortex®-M CPU and a NPU.
* **ML Island:** Another term for subsystem.
* **Direct drive:** A hardware configuration where the host CPU communicates directly with the NPU.
* **TFLite:** TensorFlow Lite
* **MHU:** Message handling unit used to pass messages between processors
## Introduction
Kristofer Jonsson's avatar
Kristofer Jonsson committed

The Linux driver stack for Ethos-U NPU contains drivers, libraries and
applications to make the NPU available for use in Linux userspace.
## Hardware configurations
The Linux driver stack supports using the NPU in two hardware configurations
that are mutually exclusive.

### Subsystem configuration

In the subsystem configuration, a subsystem consisting of an Cortex-M CPU and a
NPU is used. In this configuration, the Cortex-M manages the NPU and the host
CPU communicates requests to the subsystem using a mailbox.

![Subsystem configuration](docs/ethos-u_subsystem_config.png "Subsystem hardware configuration")

### Direct drive configuration

In the direct drive configuration, the host CPU is responsible for managing NPU
directly.

**Note:** Only the Ethos-U65 & U85 NPUs are supported in the direct drive
configuration.

![Direct drive configuration](docs/ethos-u_direct_config.png "Direct drive hardware configuration")

## Project folder structure

To make a clear separation between the different components in the driver
stack, the source files are placed into separate directories according to their
purpose. Components are only allowed to use the header files from other
components include folder.

* **cmake:** Contains toolchain files for the CMake build system
* **delegate:** TFLite delegate to offload the custom Ethos-U operator to the NPU (Only used in the direct drive configuration)
* **docs:** Documentation files
* **driver_library:** Library that provides a C++ and Python API for the NPU Linux kernel driver
* **kernel:** NPU Linux kernel driver
* **mailbox:** Linux kernel driver for the message handling unit used to pass messages between the host CPU and subsystem
* **remoteproc:** [Remote processor framework](https://www.kernel.org/doc/html/latest/staging/remoteproc.html) based Linux kernel driver used to setup and communicate with the subsystem
* **tests:** Tests for the driver stack
* **thirdparty:** Thirdparty components
* **tools:** Tools used in the driver stack
* **utils:** Utility applications for the driver stack

# Linux kernel drivers
## Mailbox driver

The mailbox driver is responsible for enabling message passing between the host
CPU and Cortex-M in the subsystem.

The driver is expected to be implemented as a [Mailbox controller driver](https://docs.kernel.org/driver-api/mailbox.html)
to enable other drivers to use it to pass messages to the subsystem.

The driver stack provides mailbox drivers for the Arm MHU version 1 and 2.

Please refer to the files in the mailbox folder for more details.

**Note:** This driver is only needed for the Subsystem configuration

## Reset driver

The reset driver is responsible for resetting the subsystem.

The driver is expected to be implemented as a [Reset controller driver](https://docs.kernel.org/driver-api/reset.html#reset-controller-driver-api)
to enable other drivers to use it to reset the subsystem.

The driver stack provides reset drivers for Juno and Corstone1000.
Please refer to the files in the remoteproc folder for more details.
**Note:** This driver is only needed for the Subsystem configuration
## Remoteproc driver

The remoteproc driver is based on the [remote processor framework](https://www.kernel.org/doc/html/latest/staging/remoteproc.html)
and is responsible for loading the firmware binary, allocating memory for the
firmware, configuring the firmware and booting the firmware on the Cortex-M
CPU.

To communicate with and reset the subsystem, the remoteproc driver uses the
mailbox and reset drivers.

Please refer to [ethosu_remoteproc.c](remoteproc/ethosu_remoteproc.c) for more
details.

Once the firmware has booted on the Cortex-M CPU, it will create a rpmsg based
communication channel between the NPU kernel driver and firmware. Rpmsg uses
shared buffers setup by the remoteproc driver to pass data between the NPU
kernel driver and firmware.

**Note:** This driver is only needed for the Subsystem configuration

### Parameters

The remoteproc driver supports two parameter when being loaded.

* **filename:** Filename of the firmware binary in `/lib/firmware` to load for the Cortex-M CPU.
* **auto_boot:** Indicates if the Cortex-M CPU should be automatically booted after loading the firmware.

## NPU kernel driver

The NPU kernel driver provides a [Userspace API (UAPI)](kernel/uapi/ethosu.h)
that Linux userspace applications will use to dispatch inferences to the NPU.

How the NPU is managed and inferences are executed depends on the hardware
configuration.

The kernel driver will detect what hardware configuration is used from the
device tree and configure the driver accordingly. For more information on how
this is done, please refer to [ethosu_driver.c](kernel/common/ethosu_driver.c).

### Direct drive configuration

In the direct drive configuration, the NPU kernel driver is fully responsible for
managing the NPU, which means it has to configure the NPU's memory interfaces,
memory regions, handle NPU interrupts, handle the power management and manage
the queuing and execution of inferences requested by userspace.

### Subsystem configuration

In the subsystem configuration, the NPU kernel driver handles all the userspace
requests and converts them into messages sent to the firmware running on the
Cortex-M in the subsystem. The firmware in turn parses the messages and
performs the actions required to fulfill the requests e.g. running an inference.

Messages are passed to and from the firmware using the rpmsg communication
channel that is setup by the firmware when it boots up.

To get access to the rpmsg communication channel, the NPU kernel driver
registers that it needs the channel with the rpmsg driver in the Linux kernel
when it is loaded. When the channel becomes available, the Linux kernel
notifies the NPU kernel driver and the communicating between the driver and
firmware starts.

Please refer to [ethosu_rpmsg_mailbox.c](kernel/rpmsg/ethosu_rpmsg_mailbox.c) for more details about these
messages.

### Folder structure

To make a clear distinction between the implementation of the two hardware
configurations in the kernel driver, the source and header files are separated
into configuration specific subfolders, and a common subfolder for everything
that is shared between them.

* **common:** Files used by both the configurations
* **direct:** Files for the direct drive configuration
* **rpmsg:** Files for the subsystem configuration
* **include:** Header files for the driver and UAPI

## Device tree
This section contains references for how the device tree is expected to look
for the direct drive and subsystem configurations.
### Direct drive configuration

The below device tree can be used as a reference for a NPU in the direct drive
configuration.

**Note:** the `ethosu_mem_config` and `ethosu_axi_config` nodes are mutually
exclusive and only one should appear in the device tree depending on the NPU
Kristofer Jonsson's avatar
Kristofer Jonsson committed

```
/ {
    reserved-memory {
        #address-cells = <2>;
        #size-cells = <2>;
        ranges;
Kristofer Jonsson's avatar
Kristofer Jonsson committed

        // Memory region for the SRAM used by the NPU
        ethosu_sram: ethosu_sram@6c000000 {
            reg = <0 0x6c000000 0 0x200000>;
            no-map;
        };
        // Memory region for DMA buffer allocation
        ethosu_reserved: ethosu_reserved@84000000 {
            compatible = "shared-dma-pool";
            reg = <0 0x84000000 0 0x20000000>;
            no-map;
        };
    };
    // NPU driver
    ethosu@6d700000 {
        #address-cells = <2>;
        #size-cells = <2>;
        compatible = "arm,ethosu-direct";
        // Base address and size of NPU registers
        reg = <0 0x6d700000 0 0x2FFF>;
        memory-region = <&ethosu_reserved>;
        sram = <&ethosu_sram>;
        // Address mappings to translate between bus addresses (NPU) and physical host CPU addresses
        dma-ranges = <0 0x6c000000 0 0x6c000000 0 0x200000>,
                     <0 0x84000000 0 0x84000000 0 0x2000000>;
        interrupts = <0 168 4>;
        interrupt-names = "irq";
        // Memory region configuration
        region-cfgs = <3 3 0 3 3 3 3 3>;

        // Memory regions used for the command stream
        cs-region = <2>;

        // Memory interface configuration for Ethos-U85
        ethosu_mem_config {
            compatible = "arm,ethosu-mem-config";
            // <beats outstanding_read outstanding_write>
            sram = <0 64 32>;
            ext  = <1 64 32>;
            // <mem_domain mem_type axi_port>
            configs = <0 0 0>,
                      <0 0 0>,
                      <0 0 1>,
                      <0 0 1>;
        };
        // Memory interface configuration for Ethos-U65
        ethosu_axi_config {
            compatible = "arm,ethosu-axi-config";
            // AXI port0 <beats mem_type outstanding_read outstanding_write>
            // AXI port0 <beats mem_type outstanding_read outstanding_write>
            // AXI port1 <beats mem_type outstanding_read outstanding_write>
            // AXI port1 <beats mem_type outstanding_read outstanding_write>
            configs = <0 0 64 32>,
                      <0 0 64 32>,
                      <0 0 64 32>,
                      <0 0 64 32>;
        };
    };
};
```
### Subsystem configuration
The below device tree can be used as a reference for a NPU in the subsystem configuration.
Kristofer Jonsson's avatar
Kristofer Jonsson committed

```
/ {
        #address-cells = <2>;
        #size-cells = <2>;
        // Memory region for the Cortex-M firmware
        ethosu_sram: ethosu_sram@6cf00000 {
            compatible = "shared-dma-pool";
            reg = <0 0x6cf00000 0 0x00100000>;
        // Memory region for DMA buffer allocation
        ethosu_reserved: ethosu_reserved@84000000 {
            compatible = "shared-dma-pool";
            reg = <0 0x84000000 0 0x20000000>;
            no-map;
        };

        // Memory region used by remoteproc to allocate tensor area,
        // shared buffers etc for the Cortex-M firmware
        ethosu_ddr: ethosu_ddr@A4000000 {
            compatible = "shared-dma-pool";
            reg = <0 0xA4000000 0 0x1000000>;

    // Message Passing Unit (MHU). Please refer to the MHU driver for details
    mhuv1: mhu@6ca00000 {
        compatible = "arm,mhu_v1", "arm,primecell";
        reg = <0x0 0x6ca00000 0x0 0x1000>;
        interrupts = <0 168 4>;
        interrupt-names = "npu_rx";
        clocks = <&soc_refclk100mhz>;
        clock-names = "apb_pclk";
    // Subsystem reset control. Please refer to the reset driver for details
    juno_ethosu_bridge: juno_ethosu_bridge@6f020000 {
        compatible = "arm,mali_fpga_sysctl";
        #reset-cells = <0x1>;
        reg = <0x0 0x6f020000 0x0 0x1000>;
    // NPU driver
    ethosu {
        #address-cells = <2>;
        #size-cells = <2>;
        compatible = "simple-bus";
        ranges;
        // Address mappings to translate between bus addresses (Cortex-M) and physical host CPU addresses
        dma-ranges = <0 0x00000000 0 0x6cf00000 0 0x00100000>,
                     <0 0x60000000 0 0x80000000 0 0x25000000>;
        // Remoteproc driver
        ethosu-rproc {
            compatible = "arm,ethosu-rproc";

            // Memory regions for the firmware
            reg = <0 0x6cf00000 0 0x00100000>,
                  <0 0xA4000000 0 0x01000000>;
            reg-names = "rom", "shared";

            memory-region = <&ethosu_reserved>;

            // Mailbox IRQ communication
            mboxes = <&mhuv1 0>, <&mhuv1 0>;
            mbox-names = "tx", "rx";

            // Reset handler
            resets = <&juno_ethosu_bridge 0>;
Kristofer Jonsson's avatar
Kristofer Jonsson committed
    };
};
```
## Inference flow

How an inference is managed and executed depends on the hardware configuration used.

### Direct drive

For the direct drive configuration, the NPU is directly managed by the NPU
kernel driver.

Unlike in the subsystem configuration where there is a firmware running on the
Cortex-M CPU that can be built with bundled networks. The direct drive
configuration does not have any firmware and therefore does not support bundled
networks.

The sequence diagram below shows an overview of how the direct drive
implementation handles an inference

![Direct drive sequence](docs/ethos-u_linux_direct_sequence.svg "Direct drive components and sequence")

### Subsystem

For the subsystem configuration, the NPU is fully managed by the firmware
running on the subsystem Cortex-M CPU. The NPU kernel driver is only
responsible for managing the requests from userspace and passing messages to
the subsystem, to perform the required actions.

The firmware running on the Cortex-M is typically implemented using a real-time operating system (RTOS) and
the [OpenAMP](https://www.openampproject.org/) framework. OpenAMP includes support for the
[Remote Processor Framework (remoteproc)](https://www.kernel.org/doc/html/latest/staging/remoteproc.html) and
[Remote Processor Messaging (rpmsg)](https://www.kernel.org/doc/html/latest/staging/rpmsg.html) which are
used to setup and communicate with the subsystem.

If one or more networks will be used frequently with the NPU, they can be
bundled with the firmware and referenced with an index in the UAPI, to avoid
having to pass the network data later.

The sequence diagram below shows an overview of how the subsystem
implementation handles an inference

![Subsystem sequence](docs/ethos-u_linux_sequence.svg "Subsystem (ML Island) components and sequence")

The purpose of the driver library is to provide user friendly C++ and Python
APIs for dispatching inferences to the NPU kernel driver.
As the component diagram below illustrates, the network is separated from the
Kristofer Jonsson's avatar
Kristofer Jonsson committed
inference, allowing multiple inferences to share the same network. The buffer
class is used to store IFM and OFM data.
Kristofer Jonsson's avatar
Kristofer Jonsson committed

![Driver library](docs/driver_library_component.svg "Driver library component diagram")

# Ethos-U delegate for direct drive
The delegate library consists of a TFLite delegate that can
offload the custom Ethos-U operator in networks, to the NPU when used in a
direct drive configuration.
The delegate is implemented using the driver library APIs to communicate with
the NPU kernel driver. It is provided as an external delegate shared library
that can be loaded into TFLite based applications.
For more information about delegates and how they are implemented,
please refer to the [LiteRT Delegates](https://ai.google.dev/edge/litert/performance/delegates) documentation.
# Running inferences

How to run inferences and how the networks need to be compiled, depends on the
hardware configuration used

For information about how to compile networks, please refer to the
[Vela](https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela)
documentation.

## IFM and OFM data

The utility applications provided by the driver stack, expect IFM data to be provided
as a single concatenated binary file where the inputs are in the order required
by the network.
If the IFM data is in NumPy format, it can be converted to a binary file using the example below.
```sh
python3 -c 'import numpy;i=numpy.load("input.npy");i.tofile("input.bin");'
If the IFM data is in separate binary files, the binary files can be concatenated using the example below.

```sh
cat input0.bin input1.bin > input.bin
```

The utility applications will provide the inference result as a single OFM file
that contains concatenated output binary data in the order specified by the
network.
## Direct drive configuration

In the direct drive configuration, only the NPU kernel driver module
(`ethosu.ko`) needs to be loaded into the Linux kernel to make the NPU
available for use.

To run inferences, the driver stack provides the `delegate_runner` utility
application. It is implemented using the TFLite library and supports using the
Ethos-U delegate to offload supported operators to the NPU.

#### Usage example

```sh
./delegate_runner -l libethosu_op_delegate.so -n <Network file> -i <IFM file> -o <OFM file>
```
For more information about the delegate runner, please see the help message in the application.

## Subsystem configuration

In the subsystem configuration, multiple kernel modules needs to be loaded into
the Linux kernel to make the NPU available for use:

* Mailbox driver (`arm_mhu.ko` or equivalent for target system)
* Reset driver (`juno_fpga_reset.ko` or equivalent for target system)
* Remoteproc driver (`ethosu_remoteproc.ko`)
* NPU kernel driver (`ethosu.ko`)

**Note:** The driver modules should be loaded in the order specified above

To run inferences, the driver stack provides the `inference_runner` utility
application. It is implemented using the driver library APIs to communicate
with the NPU kernel driver.
#### Usage example

```sh
./inference_runner -n <Network file> -i <IFM file> -o <OFM file>
```

For more information about the inference runner, please see the help message in the application.

The driver stack comes with a CMake based build system. A toolchain file is
provided as a reference on how to cross compile for Aarch64 based systems. The
driver stack build system has been verified on the following:

* Ubuntu 22.04 LTS x86 64-bit Linux distribution
* Ubuntu 22.04 LTS Arm64 Linux distribution
Note that if your host system provides cross compilers and libraries of newer
versions than what is supported on your target system, you might be required to
download an older version of compilers and toolchains for your target system.
While out of scope for this README, an example
[toolchain file](cmake/toolchain/aarch64-linux-gnu-custom.cmake) is provided to
show what it could look like. Another option is to run a Docker image of an
appropriate Linux distribution suited to build for your needs.
Building the kernel modules requires a configured Linux kernel source tree and
a minimum Sparse version matching commit `0196afe16a50c76302921b139d412e82e5be2349`.
Please refer to the Linux kernel official documentation for instructions on how
to configure and build the Linux kernel and Sparse.
$ cmake -B build --toolchain $PWD/cmake/toolchain/aarch64-linux-gnu.cmake -DKDIR=<Kernel directory>
$ cmake --build build
```
## Compiler flags used
Refer to the appropriate toolchain file and the corresponding document for
a list of compiler flags used.
# Tested kernel versions

The Linux driver stack has been tested and validated with the following Linux kernel
versions:

* v5.16.20
* v5.19.14
* v6.1.134

# Licenses

The kernel drivers are provided under a GPL v2 license. All other software
components are provided under an Apache 2.0 license.

Please see [LICENSE-APACHE-2.0.txt](LICENSE-APACHE-2.0.txt) and
[LICENSE-GPL-2.0.txt](LICENSE-GPL-2.0.txt) for more information.

The [Userspace API (UAPI)](kernel/uapi/ethosu.h) has a
'WITH Linux-syscall-note' exception to the license. Please see
[Linux-syscall-note](Linux-syscall-note.txt) for more information.

# Contributions

The Arm Ethos-U project welcomes contributions under the Apache-2.0 license.

Before we can accept your contribution, you need to certify its origin and give
us your permission. For this process we use the Developer Certificate of Origin
(DCO) V1.1 (https://developercertificate.org).

To indicate that you agree to the terms of the DCO, you "sign off" your
contribution by adding a line with your name and e-mail address to every git
commit message. You must use your real name, no pseudonyms or anonymous
contributions are accepted. If there are more than one contributor, everyone
adds their name and e-mail to the commit message.

```
Author: John Doe <john.doe@example.org>
Date:   Mon Feb 29 12:12:12 2016 +0000

Title of the commit

Short description of the change.
Signed-off-by: John Doe john.doe@example.org
Signed-off-by: Foo Bar foo.bar@example.org
```

The contributions will be code reviewed by Arm before they can be accepted into
the repository.

In order to submit a contribution, submit a merge request to the
[linux_driver_stack](https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-linux-driver-stack)
repository. To do this you will need to sign-up at [gitlab.arm.com](https://gitlab.arm.com)
and add your SSH key under your settings.
In order to submit a contribution push your patch to

## Limit access to Ethos-U driver device

The Linux driver stack does not provide any access control to the character
device created by the Ethos-U driver. It is up to the user of the Ethos-U
driver to restrict what applications that shall have access to the device.

## Unrestricted NPU memory access

The NPU does not come with any hardware to restrict what memory locations it
can access. It is up to the user to provide and configure such hardware in the
system to restrict memory access.

## Report security related issues

Please see [Security](SECURITY.md).

# Trademark notice

Arm, Cortex and Ethos are registered trademarks of Arm Limited (or its
subsidiaries) in the US and/or elsewhere.

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.