Skip to content
Commit 9805f39d authored by Xi Ruoyao's avatar Xi Ruoyao Committed by Jason A. Donenfeld
Browse files

LoongArch: vDSO: Tune chacha implementation



As Christophe pointed out, tuning the chacha implementation by
scheduling the instructions like what GCC does can improve the
performance.

The tuning does not introduce too much complexity (basically it's just
reordering some instructions). And the tuning does not hurt readibility
too much: actually the tuned code looks even more similar to a
textbook-style implementation based on 128-bit vectors.  So overall it's
a good deal to me.

Tested with vdso_test_getchacha and benched with vdso_test_getrandom.
On a LA664 the speedup is 5%, and I expect a larger speedup on LA[2-4]64
with a lower issue rate.

Suggested-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/all/77655d9e-fc05-4300-8f0d-7b2ad840d091@csgroup.eu/


Signed-off-by: default avatarXi Ruoyao <xry111@xry111.site>
Reviewed-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
parent 6ff2c290
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment