Skip to content
CHANGELOG.md 17 KiB
Newer Older
Nick Dingle's avatar
Nick Dingle committed
# Changelog

All notable changes to the Arm RAN Acceleration Library (ArmRAL) project will be
documented in this file.

## [Unreleased]

### Added

### Changed
Nick Dingle's avatar
Nick Dingle committed
- Moved `license_terms/BSD-3-Clause.txt` and
`license_terms/third_party_licenses.txt` to
[LICENSE.md](https://gitlab.arm.com/networking/ral/-/blob/main/LICENSE.md) and
[THIRD_PARTY_LICENSES.md](https://gitlab.arm.com/networking/ral/-/blob/main/THIRD_PARTY_LICENSES.md)
respectively.
Nick Dingle's avatar
Nick Dingle committed
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428

### Deprecated

### Removed

### Fixed

### Security

## [24.01] - 2024-01-19

### Changed
- Extended `armral_cmplx_pseudo_inverse_direct_f32` and
`armral_cmplx_pseudo_inverse_direct_f32_noalloc` to compute the regularized
pseudo-inverse of a single complex 32-bit matrix of size `M-by-N` for cases
where `M > N` in addition to the cases where `M <= N`.

- Improved performance of `armral_turbo_decode_block` and
`armral_turbo_decode_block_noalloc`.

- Improved SVE2 performance of `armral_seq_generator`, for the cases when
`sequence_len` is not a multiple of 64.

### Fixed
- LDPC block encoding (`armral_ldpc_encode_block`), rate matching
(`armral_ldpc_rate_matching`) and rate recovery (`armral_ldpc_rate_recovery`),
and the corresponding channel simulator, now support the insertion and removal
of filler bits as described in the 3GPP Technical Specification (TS) 38.212.
From [@Suraj4g5g](https://gitlab.arm.com/Suraj4g5g).

## [23.10] - 2023-10-06

### Changed
- Extended the `sequence_len` parameter of `armral_seq_generator` to `uint32_t`.
From [@Suraj4g5g](https://gitlab.arm.com/Suraj4g5g).

- Added parameter `i_bil` to `armral_polar_rate_matching` and
`armral_polar_rate_recovery` to enable or disable bit interleaving. From
[@Suraj4g5g](https://gitlab.arm.com/Suraj4g5g).

- Added parameter `nref` to `armral_ldpc_rate_matching` and
`armral_ldpc_rate_recovery` to enable the functions to be used with a soft
buffer size. From [@Suraj4g5g](https://gitlab.arm.com/Suraj4g5g).

- Added parameter nref to `armral_ldpc_rate_matching` and
`armral_ldpc_rate_recovery` to enable the functions to be used with a soft
buffer size. From [@Suraj4g5g](https://gitlab.arm.com/Suraj4g5g).

- Improved Neon performance of Polar block decoding
(`armral_polar_decode_block`) for list sizes 1, 2, 4 and 8.

- Improved Neon performance of LDPC block decoding (`armral_ldpc_decode_block`
and `armral_ldpc_decode_block_noalloc`).

- Simulation programs are now built by default and are tested by the make check
target.

## [23.07] - 2023-07-07

### Added
- New function to compute the regularized pseudo-inverse of a single complex
32-bit floating-point matrix (`armral_cmplx_pseudo_inverse_direct_f32`).

- New function to compute the multiplication of a complex 32-bit floating-point
matrix with its conjugate transpose (`armral_cmplx_mat_mult_aah_f32`).

- New function to compute the complex 32-bit floating-point multiplication of
the conjugate transpose of a matrix with a matrix
(`armral_cmplx_mat_mult_ahb_f32`).

- Variants of existing functions which take a pre-allocated buffer rather than
performing memory allocations internally. For functions where the buffer size is
not easily calculated from the input parameters, helper functions to calculate
the required size have been provided.

- Neon-optimized implementation of batched complex 32-bit floating-point
matrix-vector multiplication (`armral_cmplx_mat_vec_mult_batch_f32`).

- SVE2-optimized implementation of complex 32-bit floating-point general matrix
inverse for matrices of size `2x2`, `3x3` and `4x4`
(`armral_cmplx_mat_inverse_f32`).

### Changed
- Improved Neon and SVE2 performance of Mu Law compression
(`armral_mu_law_compr_8bit`, `armral_mu_law_compr_9bit`, and
`armral_mu_law_compr_14bit`).

- Improved Neon performance of 8-bit block float compression
(`armral_block_float_compr_8bit`).

- Improved SVE2 performance of 9-bit block scaling decompression
(`armral_block_scaling_decompr_9bit`).

- Improved SVE2 performance of 14-bit block scaling decompression
(`armral_block_scaling_decompr_14bit`).

- Improved SVE2 performance of 8-bit and 12-bit block float compression
(`armral_block_float_compr_8bit` and `armral_block_float_compr_12bit`).

- Moved the definition of the symbol rate out of the `ebn0_to_snr` function
(`simulation/awgn/awgn.cpp`) so that it is now a parameter that gets passed in
by each of the simulation programs.

- Updated the `convolutional_awgn` simulation program to use OpenMP
(`simulation/convolutional_awgn/convolutional_awgn.cpp`).

- Updated simulation programs to accept a path to write graphs to, instead of
auto-generating filenames.

- Added the maximum number of iterations to the output of the Turbo simulation
program (`simulation/turbo_awgn/turbo_error_rate.py`).

- Updated formatting of labels in simulation graph legends.

### Fixed
- Removed bandwidth scaling in all simulation programs so that the maximum
spectral efficiency does not exceed the number of bits per symbol.

- Convolutional decoding algorithm
(`armral_tail_biting_convolutional_decode_block`) now returns correct results
for input lengths greater than 255.

- Test file for convolutional decoding (`test/ConvCoding/decoding/main.cpp`) is
updated so that the tests pass as expected for input lengths which are not a
multiple of 4.

- Neon block float decompression functions (`armral_block_float_decompr_8bit`,
`armral_block_float_decompr_9bit`, `armral_block_float_decompr_12bit`, and
`armral_block_float_decompr_14bit`) now truncate values before storing rather
than rounding them. This means the Neon implementations of these functions now
have the same behavior as the SVE implementations.

- Neon block scaling decompression functions.
(`armral_block_scaling_decompr_8bit`, `armral_block_scaling_decompr_9bit`, and
`armral_block_scaling_decompr_14bit`) now truncate values before storing rather
than rounding them. This means the Neon implementations of these functions now
have the same behavior as the SVE implementations.

## [23.04] - 2023-04-21

### Added
- Cyclic Redundancy Check (CRC) attachment function
(`armral_polar_crc_attachment`) for Polar codes, described in section 5.2.1 of
the 3GPP Technical Specification (TS) 38.212.

- CRC function to check the validity of the output(s) of Polar decoding
(`armral_check_crc_polar`).

- New simulation program `modulation_awgn` which plots the error rate versus
Eb/N0 (or signal-to-noise ratio (SNR)) of taking a hard demodulation decision
for data sent over a noisy channel with no forward error correction.

- Added a field called `snr` to the JSON output of all simulation programs,
which stores the signal-to-noise ratio.

- Added a flag called `x-unit` to all plotting scripts which allows the user to
choose whether Eb/N0 or SNR is plotted on the x-axis.

- Added CRC attachment and check in Polar codes simulation.

### Changed

- Updated [license terms]
(https://gitlab.arm.com/networking/ral/-/blob/main/license_terms/BSD-3-Clause.txt)
to BSD-3-Clause.

- Updated Polar decoding (`armral_polar_decode_block`) to accept a list size of
8.

- LDPC decoding (`armral_ldpc_decode_block`) can optionally make use of attached
CRC information to terminate iteration early in the case that a match is found.

- Improved Neon performance of tail biting convolutional encoder for LTE
(`armral_tail_biting_convolutional_encode_block`).

- Improved Neon performance of tail biting convolutional decoder for LTE
(`armral_tail_biting_convolutional_decode_block`).

### Fixed
- Calculation of the encoded data length in the LDPC simulation program
(`armral/simulation/ldpc_awgn/ldpc_error_rate.py`) is updated to match that used
in Arm RAN Acceleration Library.

- Graphs generated from results of simulation programs in the simulation
directory no longer plot Shannon limits and theoretical maxima versus block
error rates. Shannon limits and theoretical maxima continue to be plotted for
bit error rates.

## [23.01] - 2023-01-27

### Added
- Rate matching for Turbo coding (`armral_turbo_rate_matching`). This implements
the operations in section 5.1.4.1 of the 3GPP Technical Specification (TS)
36.212.

- Rate recovery for Turbo coding (`armral_turbo_rate_recovery`). This implements
the inverse operations of rate matching. Rate matching is described in section
5.1.4.1 of the 3GPP Technical Specification (TS) 36.212.

- Tail-biting convolutional encoder for LTE
(`armral_tail_biting_convolutional_encode_block`).

- Tail-biting convolutional decoder for LTE
(`armral_tail_biting_convolutional_decode_block`).

- Scrambling for Physical Uplink Control Channels (PUCCH) formats 2, 3 and 4,
Physical Downlink Shared Channel (PDSCH), Physical Downlink Control Channel
(PDCCH), and Physical Broadcast Channel (PBCH) (`armral_scramble_code_block`).
This covers scrambling as described in 3GPP Technical Specification (TS) 38.211,
sections 6.3.2.5.1, 6.3.2.6.1, 7.3.1.1, 7.3.2.3, and 7.3.3.1.

- Simulation program for LTE tail-biting convolutional coding
(`armral/simulation/convolutional_awgn`).

- Python script that allows users to draw the data rates of each modulation and
compare them to the capacity of the AWGN channel
(`armral/simulation/capacity/capacity.py`).

- SVE2-optimized implementation of complex 32-bit floating point matrix-vector
multiplication (`armral_cmplx_mat_vec_mult_f32`).

- SVE2-optimized implementation of 14-bit block scaling decompression
(`armral_block_scaling_decompr_14bit`).

### Changed
- Modified error rate Python scripts (under `armral/simulation`) to use Eb/N0 as
x-axis (instead of the SNR) and to show the Shannon limits.

- Added Turbo rate matching and recovery to the Turbo simulation program
(`armral/simulation/turbo_awgn/turbo_awgn.cpp`).

- Improved Neon performance of block-float decompression for 9-bit and 14-bit
block-float representations. (`armral_block_float_decompr_9bit` and
`armral_block_float_decompr_14bit`).

- Improved Neon performance of complex 32-bit floating point matrix-vector
multiplication (`armral_cmplx_mat_vec_mult_f32`).

- Improved Neon performance of Gold sequence generator (`armral_seq_generator`).

- Improved Neon performance of general matrix inversion
(`armral_cmplx_mat_inverse_f32`).

- Improved Neon performance of batched general matrix inversion
(`armral_cmplx_mat_inverse_batch_f32`).

### Fixed
- Documentation of the interface for Polar rate recovery
(armral_polar_rate_recovery) updated to reflect how the parameters are used in
the implementation.

## [22.10] - 2022-10-07

### Added
- SVE2-optimized implementations of `2x2` and `4x4` matrix multiplication
functions where in-phase and quadrature components are separated
(`armral_cmplx_mat_mult_2x2_f32_iq` and `armral_cmplx_mat_mult_4x4_f32_iq`).

### Changed
- The program to evaluate the error-correction performance of Polar coding in
the presence of additive white Gaussian noise (AWGN) located in
`simulation/polar_awgn` is updated to no longer take the length of a code block
as a parameter.

- Improved the Neon and SVE2 performance of LDPC encoding for a single code
block (`armral_ldpc_encode_block`).

- Improved the Neon performance of Turbo decoding for a single code block
(`armral_turbo_decode_block`).

- Improved the Neon performance of Turbo encoding for a single code block
(`armral_turbo_encode_block`).

- Improved the Neon performance of 32-bit floating point general matrix
inversion (`armral_cmplx_mat_inverse_f32`).

- Improved the Neon performance of 32-bit floating point batch general matrix
inversion (`armral_cmplx_mat_inverse_batch_f32` and
`armral_cmplx_mat_inverse_batch_f32_pa`).

### Fixed
- The Turbo coding simulation program now builds when performing an SVE build of
the library.

## [22.07] - 2022-07-15

### Added
- SVE2-optimized implementation of equalization with four subcarriers
(`armral_solve_*x*_4sc_f32`).

- Matrix-vector multiplication functions for batches of 32-bit complex
floating-point matrices and vectors (`armral_cmplx_mat_vec_mult_batch_f32` and
`armral_cmplx_mat_vec_mult_batch_f32_pa`).

- LTE Turbo encoding function (`armral_turbo_encode_block`) that implements the
encoding scheme defined in section 5.1.3.2 of the 3GPP Technical Specification
(TS) 36.212 "Multiplexing and channel coding".

- LTE Turbo decoding function (`armral_turbo_decode_block`) that implements a
maximum a posteriori (MAP) algorithm to return a hard decision (either 0 or 1)
for each output bit.

- Functions to perform rate matching and rate recovery for Polar coding. These
implement the specification in section 5.4.1 of the 3GPP Technical Specification
(TS) 38.212.

- Functions to perform rate matching and rate recovery for LDPC coding. This
implements the specification in section 5.4.2 of the 3GPP Technical
Specification (TS) 38.212.

- Utilities to simulate the error correction performance for Polar, LDPC and
Turbo coding over a noisy channel.

### Changed
- Renamed the Polar encoding and decoding functions to
`armral_polar_encode_block` and `armral_polar_decode_block`.

- Improved the Neon and SVE2 performance of 16-QAM modulation
(`armral_modulation` with `armral_modulation_type` set to `ARMRAL_MOD_16QAM)`.

- Improved the SVE2 performance of Mu law compression and decompression
(`armral_mu_law_compr_*` and `armral_mu_law_decompr_*`).

- Improved the SVE2 performance of block float compression and decompression
(`armral_block_float_compr_*` and `armral_block_float_decompr_*`).

- Improved the SVE2 performance of 8-bit block scaling compression
(`armral_block_scaling_compr_8bit`).

- Improved the performance of 32-bit floating-point and 16-bit fixed-point
complex valued FFTs (`armral_fft_execute_cf32` and `armral_fft_execute_cs16`)
with large prime factors.

## [22.04] - 2022-04-08

### Added
- SVE2-optimized implementations batched 16-bit fixed-point matrix-vector
multiplication with 64-bit and 32-bit fixed-point accumulator
(`armral_cmplx_mat_vec_mult_batch_i16`,
`armral_cmplx_mat_vec_mult_batch_i16_pa`,
`armral_cmplx_mat_vec_mult_batch_i16_32bit`,
`armral_cmplx_mat_vec_mult_batch_i16_32bit_pa`).

- SVE2-optimized implementation of complex 32-bit floating-point singular value
decomposition (`armral_svd_cf32`).

- SVE2-optimized implementations of complex 32-bit floating-point Hermitian
matrix inversion for a single matrix or a batch of matrices of size `3x3`
(`armral_cmplx_hermitian_mat_inverse_f32` and
`armral_cmplx_hermitian_mat_inverse_batch_f32`).

- SVE2-optimized implementations of 9-bit and 14-bit Mu law compression
(`armral_mu_law_compr_9bit` and `armral_mu_law_compr_14bit`).

- SVE2-optimized implementations of 9-bit and 14-bit Mu law decompression
(`armral_mu_law_decompr_9bit` and `armral_mu_law_decompr_14bit`).

- Complex 32-bit floating-point general matrix inversion for matrices of size
`2x2`, `3x3`, `4x4`, `8x8`, and `16x16` (`armral_cmplx_mat_inverse_f32`).

### Changed
- Improved the performance of batched 16-bit floating-point matrix-vector
multiplication with 64-bit floating-point accumulator
(`armral_cmplx_mat_vec_mult_batch_i16` and
`armral_cmplx_mat_vec_mult_batch_i16_pa`).

- Improved the performance of batched 16-bit floating-point matrix-vector
multiplication with 32-bit floating-point accumulator
(`armral_cmplx_mat_vec_mult_batch_i16_32bit` and
`armral_cmplx_mat_vec_mult_batch_i16_32bit_pa`).

- Improved the performance of 14-bit block float compression
(`armral_block_float_compr_14bit`).

- Improved the performance of 14-bit block scaling compression
(`armral_block_scaling_compr_14bit`).

- Improved the performance of 14-bit Mu law compression
(`armral_mu_law_compr_14bit`).

- Improved the performance of complex 32-bit floating-point singular value
decomposition (`armral_svd_cf32`). The input matrix now needs to be stored in
column-major order. Output matrices are also returned in column-major order.

- Improved the performance of complex 32-bit floating-point Hermitian matrix
inversion for a single matrix or a batch of matrices of size `3x3`
(`armral_cmplx_hermitian_mat_inverse_f32` and
`armral_cmplx_hermitian_mat_inverse_batch_f32`).

- Improved the performance of Polar list decoding (`armral_polar_decoder`) with
list size 4. The performance for list size 1 is slightly reduced, but the
list size 4 gives much better error correction.

- Added restrictions to the number of matrices and vectors in the batch for the
functions that perform batched matrix-vector multiplications in fixed-point
precision (`armral_cmplx_mat_vec_mult_batch_i16`,
`armral_cmplx_mat_vec_mult_batch_i16_pa`,
`armral_cmplx_mat_vec_mult_batch_i16_32bit`,
`armral_cmplx_mat_vec_mult_batch_i16_32bit_pa`).

- The function to perform fixed-point complex matrix-matrix multiplication with
a 64-bit accumulator (`armral_cmplx_mat_mult_i16`) now narrows from the 64-bit
accumulator to a 32-bit intermediate value, and then to the 16-bit result using
truncating narrowing operations instead of rounding operations. This matches the
behavior in the fixed-point complex matrix-matrix multiplication with a 32-bit
accumulator.

- The function to perform fixed-point complex matrix-vector multiplication with
a 64-bit accumulator (`armral_cmplx_mat_vec_mult_i16`) now narrows from the
64-bit accumulator to a 32-bit intermediate value, and then to the 16-bit result
using truncating narrowing operations instead of rounding operations. This
matches the behavior in the fixed-point complex matrix-vector multiplication
with a 32-bit accumulator.