avx512: encrypt multiple of 16 64-byte chacha20 blocks in parallel
In order to maximize CPU port utilization with independent data, this new implementation encrypts a single buffer, encrypting up to 16*64 bytes of data, by building 16 64-byte states, which contain independent data that can be processed in parallel, to build the keystream. Once the states have been processed, their is transposed to have contiguous keystream in 16 ZMM registers that can be XOR'd with the input buffer and write it out.
Loading
Please register or sign in to comment