diff --git a/.gitlab/merge_request_templates/Bugfix.md b/.gitlab/merge_request_templates/Bugfix.md index bea5b43ffcb0e90257d1d9c5be5a55e492e9b455..e7bce9ba05de1ef04d06cc4c072ce72cda0eb2f4 100644 --- a/.gitlab/merge_request_templates/Bugfix.md +++ b/.gitlab/merge_request_templates/Bugfix.md @@ -8,6 +8,7 @@ If an [Issue](https://gitlab.arm.com/networking/ral/-/issues) already exists for * [] [Contribution meets RAL's licence terms](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-licensing-information) * [] [Documentation updated](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-documentation) +* [] ["Unreleased" section of the Changelog updated](https://gitlab.arm.com/networking/ral/-/blob/main/CHANGELOG.md#unreleased) * [] [`clang-format` and `clang-tidy` run and changes included (C/C++ code)](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-cc-code-style) * [] [`flake8` run and changes included (Python code)](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-python-code-style) * [] Commit message includes information on how to reproduce the issue(s) diff --git a/.gitlab/merge_request_templates/Default.md b/.gitlab/merge_request_templates/Default.md index 656f1d80e2e2caf224e129453f83608c379fadb6..7e12eddf190a0e6aae4d5bc03acca22a34754550 100644 --- a/.gitlab/merge_request_templates/Default.md +++ b/.gitlab/merge_request_templates/Default.md @@ -12,6 +12,7 @@ If this Merge Request addresses an [Issue](https://gitlab.arm.com/networking/ral * [] [New functions adhere to RAL's naming scheme](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-function-naming) * [] [Contribution conforms to RAL's directory structure](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-directory-structure) * [] [Documentation updated](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-documentation) +* [] ["Unreleased" section of the Changelog updated](https://gitlab.arm.com/networking/ral/-/blob/main/CHANGELOG.md#unreleased) * [] [`clang-format` and `clang-tidy` run and changes included (C/C++ code)](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-cc-code-style) * [] [`flake8` run and changes included (Python code)](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-python-code-style) * [] [Tests added or updated](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-writing-tests) diff --git a/.gitlab/merge_request_templates/Documentation.md b/.gitlab/merge_request_templates/Documentation.md index 249b11b56a0a0b22ac0ff6609d5768771bf7ea31..ab2485305a5ffe4233143bb71ab77f426fa778fc 100644 --- a/.gitlab/merge_request_templates/Documentation.md +++ b/.gitlab/merge_request_templates/Documentation.md @@ -9,6 +9,7 @@ If this Merge Request addresses an [Issue](https://gitlab.arm.com/networking/ral ## Checklist * [] [Contribution meets RAL's licence terms](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-licensing-information) +* [] ["Unreleased" section of the Changelog updated](https://gitlab.arm.com/networking/ral/-/blob/main/CHANGELOG.md#unreleased) * [] [`make docs` target runs successfully](https://gitlab.arm.com/networking/ral/-/blob/main/README.md?ref_type=heads#user-content-documentation) * [] [`clang-format` and `clang-tidy` run and changes included (C/C++ code)](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-cc-code-style) * [] [`flake8` run and changes included (Python code)](https://gitlab.arm.com/networking/ral/-/blob/main/CONTRIBUTING.md#user-content-python-code-style) diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000000000000000000000000000000000000..ee10db9dbc912adc24907fd24fb240426fd99ab3 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,423 @@ +# Changelog + +All notable changes to the Arm RAN Acceleration Library (ArmRAL) project will be +documented in this file. + +## [Unreleased] + +### Added + +### Changed + +### Deprecated + +### Removed + +### Fixed + +### Security + +## [24.01] - 2024-01-19 + +### Changed +- Extended `armral_cmplx_pseudo_inverse_direct_f32` and +`armral_cmplx_pseudo_inverse_direct_f32_noalloc` to compute the regularized +pseudo-inverse of a single complex 32-bit matrix of size `M-by-N` for cases +where `M > N` in addition to the cases where `M <= N`. + +- Improved performance of `armral_turbo_decode_block` and +`armral_turbo_decode_block_noalloc`. + +- Improved SVE2 performance of `armral_seq_generator`, for the cases when +`sequence_len` is not a multiple of 64. + +### Fixed +- LDPC block encoding (`armral_ldpc_encode_block`), rate matching +(`armral_ldpc_rate_matching`) and rate recovery (`armral_ldpc_rate_recovery`), +and the corresponding channel simulator, now support the insertion and removal +of filler bits as described in the 3GPP Technical Specification (TS) 38.212. +From [@Suraj4g5g](https://gitlab.arm.com/Suraj4g5g). + +## [23.10] - 2023-10-06 + +### Changed +- Extended the `sequence_len` parameter of `armral_seq_generator` to `uint32_t`. +From [@Suraj4g5g](https://gitlab.arm.com/Suraj4g5g). + +- Added parameter `i_bil` to `armral_polar_rate_matching` and +`armral_polar_rate_recovery` to enable or disable bit interleaving. From +[@Suraj4g5g](https://gitlab.arm.com/Suraj4g5g). + +- Added parameter `nref` to `armral_ldpc_rate_matching` and +`armral_ldpc_rate_recovery` to enable the functions to be used with a soft +buffer size. From [@Suraj4g5g](https://gitlab.arm.com/Suraj4g5g). + +- Added parameter nref to `armral_ldpc_rate_matching` and +`armral_ldpc_rate_recovery` to enable the functions to be used with a soft +buffer size. From [@Suraj4g5g](https://gitlab.arm.com/Suraj4g5g). + +- Improved Neon performance of Polar block decoding +(`armral_polar_decode_block`) for list sizes 1, 2, 4 and 8. + +- Improved Neon performance of LDPC block decoding (`armral_ldpc_decode_block` +and `armral_ldpc_decode_block_noalloc`). + +- Simulation programs are now built by default and are tested by the make check +target. + +## [23.07] - 2023-07-07 + +### Added +- New function to compute the regularized pseudo-inverse of a single complex +32-bit floating-point matrix (`armral_cmplx_pseudo_inverse_direct_f32`). + +- New function to compute the multiplication of a complex 32-bit floating-point +matrix with its conjugate transpose (`armral_cmplx_mat_mult_aah_f32`). + +- New function to compute the complex 32-bit floating-point multiplication of +the conjugate transpose of a matrix with a matrix +(`armral_cmplx_mat_mult_ahb_f32`). + +- Variants of existing functions which take a pre-allocated buffer rather than +performing memory allocations internally. For functions where the buffer size is +not easily calculated from the input parameters, helper functions to calculate +the required size have been provided. + +- Neon-optimized implementation of batched complex 32-bit floating-point +matrix-vector multiplication (`armral_cmplx_mat_vec_mult_batch_f32`). + +- SVE2-optimized implementation of complex 32-bit floating-point general matrix +inverse for matrices of size `2x2`, `3x3` and `4x4` +(`armral_cmplx_mat_inverse_f32`). + +### Changed +- Improved Neon and SVE2 performance of Mu Law compression +(`armral_mu_law_compr_8bit`, `armral_mu_law_compr_9bit`, and +`armral_mu_law_compr_14bit`). + +- Improved Neon performance of 8-bit block float compression +(`armral_block_float_compr_8bit`). + +- Improved SVE2 performance of 9-bit block scaling decompression +(`armral_block_scaling_decompr_9bit`). + +- Improved SVE2 performance of 14-bit block scaling decompression +(`armral_block_scaling_decompr_14bit`). + +- Improved SVE2 performance of 8-bit and 12-bit block float compression +(`armral_block_float_compr_8bit` and `armral_block_float_compr_12bit`). + +- Moved the definition of the symbol rate out of the `ebn0_to_snr` function +(`simulation/awgn/awgn.cpp`) so that it is now a parameter that gets passed in +by each of the simulation programs. + +- Updated the `convolutional_awgn` simulation program to use OpenMP +(`simulation/convolutional_awgn/convolutional_awgn.cpp`). + +- Updated simulation programs to accept a path to write graphs to, instead of +auto-generating filenames. + +- Added the maximum number of iterations to the output of the Turbo simulation +program (`simulation/turbo_awgn/turbo_error_rate.py`). + +- Updated formatting of labels in simulation graph legends. + +### Fixed +- Removed bandwidth scaling in all simulation programs so that the maximum +spectral efficiency does not exceed the number of bits per symbol. + +- Convolutional decoding algorithm +(`armral_tail_biting_convolutional_decode_block`) now returns correct results +for input lengths greater than 255. + +- Test file for convolutional decoding (`test/ConvCoding/decoding/main.cpp`) is +updated so that the tests pass as expected for input lengths which are not a +multiple of 4. + +- Neon block float decompression functions (`armral_block_float_decompr_8bit`, +`armral_block_float_decompr_9bit`, `armral_block_float_decompr_12bit`, and +`armral_block_float_decompr_14bit`) now truncate values before storing rather +than rounding them. This means the Neon implementations of these functions now +have the same behavior as the SVE implementations. + +- Neon block scaling decompression functions. +(`armral_block_scaling_decompr_8bit`, `armral_block_scaling_decompr_9bit`, and +`armral_block_scaling_decompr_14bit`) now truncate values before storing rather +than rounding them. This means the Neon implementations of these functions now +have the same behavior as the SVE implementations. + +## [23.04] - 2023-04-21 + +### Added +- Cyclic Redundancy Check (CRC) attachment function +(`armral_polar_crc_attachment`) for Polar codes, described in section 5.2.1 of +the 3GPP Technical Specification (TS) 38.212. + +- CRC function to check the validity of the output(s) of Polar decoding +(`armral_check_crc_polar`). + +- New simulation program `modulation_awgn` which plots the error rate versus +Eb/N0 (or signal-to-noise ratio (SNR)) of taking a hard demodulation decision +for data sent over a noisy channel with no forward error correction. + +- Added a field called `snr` to the JSON output of all simulation programs, +which stores the signal-to-noise ratio. + +- Added a flag called `x-unit` to all plotting scripts which allows the user to +choose whether Eb/N0 or SNR is plotted on the x-axis. + +- Added CRC attachment and check in Polar codes simulation. + +### Changed + +- Updated [license terms] +(https://gitlab.arm.com/networking/ral/-/blob/main/license_terms/BSD-3-Clause.txt) +to BSD-3-Clause. + +- Updated Polar decoding (`armral_polar_decode_block`) to accept a list size of +8. + +- LDPC decoding (`armral_ldpc_decode_block`) can optionally make use of attached +CRC information to terminate iteration early in the case that a match is found. + +- Improved Neon performance of tail biting convolutional encoder for LTE +(`armral_tail_biting_convolutional_encode_block`). + +- Improved Neon performance of tail biting convolutional decoder for LTE +(`armral_tail_biting_convolutional_decode_block`). + +### Fixed +- Calculation of the encoded data length in the LDPC simulation program +(`armral/simulation/ldpc_awgn/ldpc_error_rate.py`) is updated to match that used +in Arm RAN Acceleration Library. + +- Graphs generated from results of simulation programs in the simulation +directory no longer plot Shannon limits and theoretical maxima versus block +error rates. Shannon limits and theoretical maxima continue to be plotted for +bit error rates. + +## [23.01] - 2023-01-27 + +### Added +- Rate matching for Turbo coding (`armral_turbo_rate_matching`). This implements +the operations in section 5.1.4.1 of the 3GPP Technical Specification (TS) +36.212. + +- Rate recovery for Turbo coding (`armral_turbo_rate_recovery`). This implements +the inverse operations of rate matching. Rate matching is described in section +5.1.4.1 of the 3GPP Technical Specification (TS) 36.212. + +- Tail-biting convolutional encoder for LTE +(`armral_tail_biting_convolutional_encode_block`). + +- Tail-biting convolutional decoder for LTE +(`armral_tail_biting_convolutional_decode_block`). + +- Scrambling for Physical Uplink Control Channels (PUCCH) formats 2, 3 and 4, +Physical Downlink Shared Channel (PDSCH), Physical Downlink Control Channel +(PDCCH), and Physical Broadcast Channel (PBCH) (`armral_scramble_code_block`). +This covers scrambling as described in 3GPP Technical Specification (TS) 38.211, +sections 6.3.2.5.1, 6.3.2.6.1, 7.3.1.1, 7.3.2.3, and 7.3.3.1. + +- Simulation program for LTE tail-biting convolutional coding +(`armral/simulation/convolutional_awgn`). + +- Python script that allows users to draw the data rates of each modulation and +compare them to the capacity of the AWGN channel +(`armral/simulation/capacity/capacity.py`). + +- SVE2-optimized implementation of complex 32-bit floating point matrix-vector +multiplication (`armral_cmplx_mat_vec_mult_f32`). + +- SVE2-optimized implementation of 14-bit block scaling decompression +(`armral_block_scaling_decompr_14bit`). + +### Changed +- Modified error rate Python scripts (under `armral/simulation`) to use Eb/N0 as +x-axis (instead of the SNR) and to show the Shannon limits. + +- Added Turbo rate matching and recovery to the Turbo simulation program +(`armral/simulation/turbo_awgn/turbo_awgn.cpp`). + +- Improved Neon performance of block-float decompression for 9-bit and 14-bit +block-float representations. (`armral_block_float_decompr_9bit` and +`armral_block_float_decompr_14bit`). + +- Improved Neon performance of complex 32-bit floating point matrix-vector +multiplication (`armral_cmplx_mat_vec_mult_f32`). + +- Improved Neon performance of Gold sequence generator (`armral_seq_generator`). + +- Improved Neon performance of general matrix inversion +(`armral_cmplx_mat_inverse_f32`). + +- Improved Neon performance of batched general matrix inversion +(`armral_cmplx_mat_inverse_batch_f32`). + +### Fixed +- Documentation of the interface for Polar rate recovery +(armral_polar_rate_recovery) updated to reflect how the parameters are used in +the implementation. + +## [22.10] - 2022-10-07 + +### Added +- SVE2-optimized implementations of `2x2` and `4x4` matrix multiplication +functions where in-phase and quadrature components are separated +(`armral_cmplx_mat_mult_2x2_f32_iq` and `armral_cmplx_mat_mult_4x4_f32_iq`). + +### Changed +- The program to evaluate the error-correction performance of Polar coding in +the presence of additive white Gaussian noise (AWGN) located in +`simulation/polar_awgn` is updated to no longer take the length of a code block +as a parameter. + +- Improved the Neon and SVE2 performance of LDPC encoding for a single code +block (`armral_ldpc_encode_block`). + +- Improved the Neon performance of Turbo decoding for a single code block +(`armral_turbo_decode_block`). + +- Improved the Neon performance of Turbo encoding for a single code block +(`armral_turbo_encode_block`). + +- Improved the Neon performance of 32-bit floating point general matrix +inversion (`armral_cmplx_mat_inverse_f32`). + +- Improved the Neon performance of 32-bit floating point batch general matrix +inversion (`armral_cmplx_mat_inverse_batch_f32` and +`armral_cmplx_mat_inverse_batch_f32_pa`). + +### Fixed +- The Turbo coding simulation program now builds when performing an SVE build of +the library. + +## [22.07] - 2022-07-15 + +### Added +- SVE2-optimized implementation of equalization with four subcarriers +(`armral_solve_*x*_4sc_f32`). + +- Matrix-vector multiplication functions for batches of 32-bit complex +floating-point matrices and vectors (`armral_cmplx_mat_vec_mult_batch_f32` and +`armral_cmplx_mat_vec_mult_batch_f32_pa`). + +- LTE Turbo encoding function (`armral_turbo_encode_block`) that implements the +encoding scheme defined in section 5.1.3.2 of the 3GPP Technical Specification +(TS) 36.212 "Multiplexing and channel coding". + +- LTE Turbo decoding function (`armral_turbo_decode_block`) that implements a +maximum a posteriori (MAP) algorithm to return a hard decision (either 0 or 1) +for each output bit. + +- Functions to perform rate matching and rate recovery for Polar coding. These +implement the specification in section 5.4.1 of the 3GPP Technical Specification +(TS) 38.212. + +- Functions to perform rate matching and rate recovery for LDPC coding. This +implements the specification in section 5.4.2 of the 3GPP Technical +Specification (TS) 38.212. + +- Utilities to simulate the error correction performance for Polar, LDPC and +Turbo coding over a noisy channel. + +### Changed +- Renamed the Polar encoding and decoding functions to +`armral_polar_encode_block` and `armral_polar_decode_block`. + +- Improved the Neon and SVE2 performance of 16-QAM modulation +(`armral_modulation` with `armral_modulation_type` set to `ARMRAL_MOD_16QAM)`. + +- Improved the SVE2 performance of Mu law compression and decompression +(`armral_mu_law_compr_*` and `armral_mu_law_decompr_*`). + +- Improved the SVE2 performance of block float compression and decompression +(`armral_block_float_compr_*` and `armral_block_float_decompr_*`). + +- Improved the SVE2 performance of 8-bit block scaling compression +(`armral_block_scaling_compr_8bit`). + +- Improved the performance of 32-bit floating-point and 16-bit fixed-point +complex valued FFTs (`armral_fft_execute_cf32` and `armral_fft_execute_cs16`) +with large prime factors. + +## [22.04] - 2022-04-08 + +### Added +- SVE2-optimized implementations batched 16-bit fixed-point matrix-vector +multiplication with 64-bit and 32-bit fixed-point accumulator +(`armral_cmplx_mat_vec_mult_batch_i16`, +`armral_cmplx_mat_vec_mult_batch_i16_pa`, +`armral_cmplx_mat_vec_mult_batch_i16_32bit`, +`armral_cmplx_mat_vec_mult_batch_i16_32bit_pa`). + +- SVE2-optimized implementation of complex 32-bit floating-point singular value +decomposition (`armral_svd_cf32`). + +- SVE2-optimized implementations of complex 32-bit floating-point Hermitian +matrix inversion for a single matrix or a batch of matrices of size `3x3` +(`armral_cmplx_hermitian_mat_inverse_f32` and +`armral_cmplx_hermitian_mat_inverse_batch_f32`). + +- SVE2-optimized implementations of 9-bit and 14-bit Mu law compression +(`armral_mu_law_compr_9bit` and `armral_mu_law_compr_14bit`). + +- SVE2-optimized implementations of 9-bit and 14-bit Mu law decompression +(`armral_mu_law_decompr_9bit` and `armral_mu_law_decompr_14bit`). + +- Complex 32-bit floating-point general matrix inversion for matrices of size +`2x2`, `3x3`, `4x4`, `8x8`, and `16x16` (`armral_cmplx_mat_inverse_f32`). + +### Changed +- Improved the performance of batched 16-bit floating-point matrix-vector +multiplication with 64-bit floating-point accumulator +(`armral_cmplx_mat_vec_mult_batch_i16` and +`armral_cmplx_mat_vec_mult_batch_i16_pa`). + +- Improved the performance of batched 16-bit floating-point matrix-vector +multiplication with 32-bit floating-point accumulator +(`armral_cmplx_mat_vec_mult_batch_i16_32bit` and +`armral_cmplx_mat_vec_mult_batch_i16_32bit_pa`). + +- Improved the performance of 14-bit block float compression +(`armral_block_float_compr_14bit`). + +- Improved the performance of 14-bit block scaling compression +(`armral_block_scaling_compr_14bit`). + +- Improved the performance of 14-bit Mu law compression +(`armral_mu_law_compr_14bit`). + +- Improved the performance of complex 32-bit floating-point singular value +decomposition (`armral_svd_cf32`). The input matrix now needs to be stored in +column-major order. Output matrices are also returned in column-major order. + +- Improved the performance of complex 32-bit floating-point Hermitian matrix +inversion for a single matrix or a batch of matrices of size `3x3` +(`armral_cmplx_hermitian_mat_inverse_f32` and +`armral_cmplx_hermitian_mat_inverse_batch_f32`). + +- Improved the performance of Polar list decoding (`armral_polar_decoder`) with +list size 4. The performance for list size 1 is slightly reduced, but the +list size 4 gives much better error correction. + +- Added restrictions to the number of matrices and vectors in the batch for the +functions that perform batched matrix-vector multiplications in fixed-point +precision (`armral_cmplx_mat_vec_mult_batch_i16`, +`armral_cmplx_mat_vec_mult_batch_i16_pa`, +`armral_cmplx_mat_vec_mult_batch_i16_32bit`, +`armral_cmplx_mat_vec_mult_batch_i16_32bit_pa`). + +- The function to perform fixed-point complex matrix-matrix multiplication with +a 64-bit accumulator (`armral_cmplx_mat_mult_i16`) now narrows from the 64-bit +accumulator to a 32-bit intermediate value, and then to the 16-bit result using +truncating narrowing operations instead of rounding operations. This matches the +behavior in the fixed-point complex matrix-matrix multiplication with a 32-bit +accumulator. + +- The function to perform fixed-point complex matrix-vector multiplication with +a 64-bit accumulator (`armral_cmplx_mat_vec_mult_i16`) now narrows from the +64-bit accumulator to a 32-bit intermediate value, and then to the 16-bit result +using truncating narrowing operations instead of rounding operations. This +matches the behavior in the fixed-point complex matrix-vector multiplication +with a 32-bit accumulator.