Skip to content
README.md 12.8 KiB
Newer Older
Kristofer Jonsson's avatar
Kristofer Jonsson committed
# Arm(R) Ethos(TM)-U core platform
Kristofer Jonsson's avatar
Kristofer Jonsson committed
Arm(R) Ethos(TM)-U core platform is provided as an example of how to produce a
firmware binary for a given target platform. This software is primarily intended
for guidance, to demonstrate how to boot up a firmware binary and how to run an
inference on an Arm Ethos-U compatible platform.

This repository contains target specific files, like linker scripts. Target
agnostic software components are provided in the
[core_software](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-software.git)
repository.

Kristofer Jonsson's avatar
Kristofer Jonsson committed
# Arm(R) Corstone(TM)-300
Kristofer Jonsson's avatar
Kristofer Jonsson committed
[Arm(R) Corstone(TM)-300](https://developer.arm.com/ip-products/subsystem/corstone/corstone-300)
is a reference design of how to to build a secure System on Chip (SoC). A fixed
Kristofer Jonsson's avatar
Kristofer Jonsson committed
virtual platform (FVP) of the Arm Corstone-300 including the Arm Ethos-U can be
downloaded from the Ecosystem page at
[developer.arm.com](https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps).

Kristofer Jonsson's avatar
Kristofer Jonsson committed
## Building
Kristofer Jonsson's avatar
Kristofer Jonsson committed
Building core platform requires a recent version of CMake to be installed
together with a compiler capable of cross compiling for Arm Cortex-M. There are
sample toolchain files provided for Arm Clang and Arm GCC.
Kristofer Jonsson's avatar
Kristofer Jonsson committed
To run the helper scripts Python 3 is required with the packages listed in
`requirements.txt`.
Kristofer Jonsson's avatar
Kristofer Jonsson committed
$ pip install -U pip
$ pip install -r requirements.txt
```

The following commands will produce an elf file which can be run on the FVP.
Kristofer Jonsson's avatar
Kristofer Jonsson committed
$ cmake -B build targets/corstone-300
$ cmake --build build
```

It is also possible to build with a different toolchain.

```
Kristofer Jonsson's avatar
Kristofer Jonsson committed
$ cmake -B build targets/corstone-300 -DCMAKE_TOOLCHAIN_FILE=$PWD/cmake/toolchain/arm-none-eabi-gcc.cmake
$ cmake --build build
Please see [README_WINDOWS.md](README_WINDOWS.md) for additional information
regarding building on a Windows system.

Kristofer Jonsson's avatar
Kristofer Jonsson committed
## run_platform.py

There are many things to consider when deploying a network to an embedded
system. Where should the data be placed, in SRAM, DRAM or flash? How is the
performance affected if a fast or slower memory is used? Which Ethos-U
performance counters should be enabled to measure the performance?

The main purpose of `scripts/run_platform.py` is to document how to go from
tflite to an application that can be run on a an embedded platform like
Corstone-300. It also allows users to adjust some settings like memory
configuration, timing adapter settings or which PMU events to monitor. Please
refer to the help message for further details about which arguments that can be
passed to the script.

```
$ scripts/run_platform.py --network-path <tflite>
```
Kristofer Jonsson's avatar
Kristofer Jonsson committed
## Corstone-300 FVP
Assuming that the Corstone-300 FVP has been downloaded, installed and placed in
Kristofer Jonsson's avatar
Kristofer Jonsson committed
the PATH variable. Then the software binaries can be tested like this.
Kristofer Jonsson's avatar
Kristofer Jonsson committed
$ ctest
```

Individual applications can also be run directly with the FVP, for example like
this.

```
$ FVP_Corstone_SSE-300_Ethos-U55 applications/freertos/freertos.elf
Kristofer Jonsson's avatar
Kristofer Jonsson committed

The Corstone-300 FVP allows some parameters to be modified, for example the
number of Ethos-U MAC units can be configured with
`-C ethosu.num_macs=<64|128|256|...>`. Please note that the network must be
recompiled with Vela if the MAC configuration changes. Please also note that the
set of valid MAC configuration is different for Ethos-U55 and Ethos-U65.

```
$ FVP_Corstone_SSE-300_Ethos-U55 -C ethosu.num_macs=256 applications/freertos/freertos.elf
```

## Corstone-300 MPS3 FPGA

The files needed to get started for Corstone-300 can be found on
[developer.arm.com](https://developer.arm.com/tools-and-software/development-boards/fpga-prototyping-boards/download-fpga-images).

Follow the documentation in the downloaded archive to setup the board with the
Corstone-300 FPGA bit files.

Kristofer Jonsson's avatar
Kristofer Jonsson committed
The built files can then be ran by adapting the steps in chapter *Software*,
using the extracted binary files from the build process. This is needed for the
Kristofer Jonsson's avatar
Kristofer Jonsson committed
boot loader on the FPGA to be able to load the memories.
Kristofer Jonsson's avatar
Kristofer Jonsson committed
1. Copy the binary files for the application in the `fw` folder to the board
   `<MPS3_dir>/SOFTWARE` folder, making sure the filename is max 8 characters
   long.
2. Navigate to <MPS3_dir>MB/HBI0309C/<version> and open the images.txt file in a
   text editor.
3. Remove the lines under the '[IMAGES]' section and replace it with:

```
TOTALIMAGES: 2

Kristofer Jonsson's avatar
Kristofer Jonsson committed
IMAGE0ADDRESS: 0x01000000
IMAGE0UPDATE:  AUTO
Kristofer Jonsson's avatar
Kristofer Jonsson committed
IMAGE0FILE:    \SOFTWARE\10000000 ; ITCM secure

IMAGE1ADDRESS: 0x0c000000
IMAGE1UPDATE:  AUTO
Kristofer Jonsson's avatar
Kristofer Jonsson committed
IMAGE1FILE:    \SOFTWARE\70000000 ; DDR secure
Kristofer Jonsson's avatar
Kristofer Jonsson committed
The mapping between the Cortex-M55 address space and the addresses the FPGA MMC
boot loader need can be found in section *MCC Memory mapping* of the
documentation in the Corstone-300 FPGA archive. A part of the table is shown
Kristofer Jonsson's avatar
Kristofer Jonsson committed
below.
Kristofer Jonsson's avatar
Kristofer Jonsson committed
 | Cortex-M55  | MMC Bootloader | Name            |
 |-------------|----------------|-----------------|
 | 0x0000_0000 | 0x0000_0000    | ITCM non secure |
 | 0x1000_0000 | 0x0100_0000    | ITCM secure     |
 | 0x0100_0000 | 0x0200_0000    | SRAM non secure |
 | 0x1100_0000 | 0x0300_0000    | SRAM secure     |
 | 0x6000_0000 | 0x0800_0000    | DDR non secure  |
 | 0x7000_0000 | 0x0c00_0000    | DDR secure      |
Kristofer Jonsson's avatar
Kristofer Jonsson committed
For example, the binary that the Cortex-M55 CPU expects at address 0x1000_0000
must therefor be written by the MCC to 0x0100_0000.

Power up the board with the PBON and the application output will be seen on the
serial console.
# Memory configurations

Embedded systems come in very different configurations, but typically they have
a limited amount of high bandwidth low latency memory like SRAM, and some more
low bandwidth high latency memory like flash or DRAM.

The Tensorflow Lite for Microcontrollers (TFLu) framework needs two buffers to
run an inference, the *model* and the *arena*. The model contains static data
like weights and biases. The arena contains read write data like activations,
IFM, OFM, temporary data etc. Please note that the IFM and OFM are located
*inside* of the arena.

The placement of the model and arena has a big impact on the performance. There
are three configurations that make sense for most systems.

| Model      | Arena      | Spilling | Note           |
|------------|------------|----------|----------------|
| SRAM       | SRAM       | No       |                |
| Flash/DRAM | SRAM       | No       |                |
| Flash/DRAM | Flash/DRAM | Yes      | Ethos-U65 only |

## Model and arena in SRAM

For optimal performance both model and arena should be placed in SRAM.

## Model flash/DRAM, Arena SRAM

If both model and arena do not fit in SRAM, then it makes most sense to move the
model to flash/DRAM. The performance penalty depends on the network and will
need to be measured. For example weight bound networks will experience a larger
performance drop than MAC bound networks.

## Model and arena in flash/DRAM (Ethos-U65 only)

Moving both model and arena to flash/DRAM comes with quite a hefty performance
penalty. To mitigate some of this *spilling* can be used.

Spilling means that a small buffer is reserved in SRAM that acts like a cache
for frequently accessed data. When spilling is enabled
[Vela](https://git.mlplatform.org/ml/ethos-u/ethos-u-vela.git/about/) will
prepend and append extra instructions to the command stream to DMA copy data
between the arena and the spilling buffer.

Some of the data stored in the spilling buffer must be copied back to the arena,
which is done as DMA transfer over AXI 1. This is only supported by Ethos-U65,
because Ethos-U55 is equipped with a readonly AXI 1 interface.

# Multi NPU

The Tensorflow Lite for Microcontrollers (TFLu) framework supports running
multiple parallel inferences. Each parallel inference requires a TFLu arena
(costs memory) and a stack (requires an RTOS). The examples provided in this
repo are implemented in the application layer, which means that any RTOS could
be used.

The Ethos-U NPU driver is implemented in plain C. To enable thread safety in a
multi-threading environment the driver defines a set of weak functions that the
application is expected to override, providing implementations for mutex and
semaphore primitives.

The weak function can be found in
[ethosu_driver.c](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-driver.git/tree/src/ethosu_driver.c?id=35b5d0eebf9709a3439d362a0b53d6270cbc4a94#n173).
An example based on FreeRTOS how to override and implement these functions can
be found in
[applications/freertos/main.cpp](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-platform.git/tree/applications/freertos/main.cpp?id=991af2bd8fb6c79dfb317837353857f34a727b17#n108).

The sequence diagram below illustrates the call stack for a multi NPU system.
Please note how the `ethosu_mutex_*` and `ethosu_semaphore_*` functions are
implemented in the application layer. Mutexes are used for thread safety and
semaphores for sleeping.

![Multi NPU](docs/multinpu.svg "Multi NPU sequence diagram")

Kristofer Jonsson's avatar
Kristofer Jonsson committed
## Multi NPU tradeoffs

A single Cortex-M is capable of driving multiple Ethos-U. What the optimal
number of Ethos-U is, that is impossible to tell without knowing which network
to run or without detailed knowledge about the limitations of the embedded
system.

Each parallel inference requires an arena. The arena should for optimal
performance be placed in a high bandwidth low latency memory like SRAM, which is
a cost that has to be considered. The size of the arena varies greatly depending
on the network.

For networks that map fully to Ethos-U, the memory bandwidth might become a
limiting factor. For networks that run partly in software, the Cortex-M might
become the limiting factor. The placement of the TFLu model and arena (flash,
DRAM, SRAM, etc) will also have a big impact on the performance.

# Startup

The applications in this repo use
[CMSIS Device](https://github.com/ARM-software/CMSIS_5/tree/develop/Device/) to
startup the Cortex-M. The standard procedure is to copy and modify the CMSIS
templates, but in this repo we have chosen to include the unmodified templates
directly from CMSIS.

The sequence diagram below describes what happens after the Cortex-M reset is
lifted, up until the execution enters the application `main()`.

![Startup](docs/startup.svg "Startup sequence diagram")

## CMSIS Device

First thing that happens is that the CPU loads index 0 from the interrupt vector
into the SP register and index 1 into the PC register, and then starts executing
from the PC location.

Index 1 in the VTOR is referred to as the reset handler and is resposible for
initializing the CPU. If the CPU for example has a FPU or MVE extension, then
these are enabled.

## Compiler runtime

The entry function for the compiler runtime setup varies depending on which
compiler that is used. For Arm Clang this function is called `__main()`, not to
be confused with the application `main()`!

The runtime is responsible for initializing the memory segments and setting up
the runtime environment. Please refer to the compiler documentation for detailed
information about the runtime setup.

## Target

The [`init()`](targets/common/src/init.cpp) is defined as a constructor, which
will be called before the application `main()`. We use this constructor to run
`targetSetup()` to initialize the platform.

For each target there is a `targets/<target>` directory, which contains linker
scripts and code needed to setup the target. `targetSetup()` is implemented in
this folder and is responsible for initializing drivers, configuring the MPU,
enabling caches etc.

Adding a new target would involve creating a new `targets/<target>` directory,
providing linker scripts and implementing `targetSetup()`.

## Application

Finally the runtime calls application `main()`. Ideally the application code
should be generic and have no knowledge about which target it is executing on.

Kristofer Jonsson's avatar
Kristofer Jonsson committed
The Arm Ethos-U core platform is provided under an Apache-2.0 license. Please
see [LICENSE.txt](LICENSE.txt) for more information.

# Contributions

The Arm Ethos-U project welcomes contributions under the Apache-2.0 license.

Before we can accept your contribution, you need to certify its origin and give
us your permission. For this process we use the Developer Certificate of Origin
(DCO) V1.1 (https://developercertificate.org).

To indicate that you agree to the terms of the DCO, you "sign off" your
contribution by adding a line with your name and e-mail address to every git
commit message. You must use your real name, no pseudonyms or anonymous
contributions are accepted. If there are more than one contributor, everyone
adds their name and e-mail to the commit message.

```
Author: John Doe \<john.doe@example.org\>
Date:   Mon Feb 29 12:12:12 2016 +0000

Title of the commit

Short description of the change.
Signed-off-by: John Doe john.doe@example.org
Signed-off-by: Foo Bar foo.bar@example.org
```

The contributions will be code reviewed by Arm before they can be accepted into
the repository.

# Security

Please see [Security](SECURITY.md).
Kristofer Jonsson's avatar
Kristofer Jonsson committed

# Trademark notice

Arm, Cortex, Corstone and Ethos are registered trademarks of Arm Limited (or its
subsidiaries) in the US and/or elsewhere.