Friday, 2022-05-06

*** tpb <[email protected]> has joined #openrisc00:00
zx2c4shorne: how big are openrisc cache lines?00:48
zx2c4i just noticed that _immu_trampoline is aligned to 64 bytes but a comment lower on says it should be aligned to 4*cache_size00:49
zx2c4        .align 6400:50
zx2c4_immu_trampoline:00:50
zx2c4...00:50
zx2c4        // immu_trampoline is (4x) CACHE_LINE aligned00:50
shornebased on cpu its 16-bytes or 32-bytes02:53
shorneI found if we have the failure condition, adding 1-15 nops will fix it , but adding 16 nops the issue will trigger again.  So its eems something to do with 4*16, 64-byte offsets02:55
shorneso it might be related to this02:55
shornebut in qemu caches are disabled02:55
shorneI was thinking about System.map, and playing with something.  If I add l.nop's in functions higher in address space the issue persists05:07
shorneIf I add nop's in functions lower in the map, the issue is fixed05:07
shorneSo I did a kind of binary search05:07
shorneunfortunately I ended up in lib/crypto/curve25519-fiat32.c05:08
shorneInteresting, adding this: __attribute__((optimize("align-functions=8"))) to that file fixes the issue05:24
shornethis is the only diff: https://gist.github.com/04ca31ee4c512bd87fc2dc5b6298fbd205:24
shorneit makes almost no sense, those extra jumps (l.j) are actually just x0000000 padding the end of the function05:30
shorneso no binary changes other than alignment fixes by 8 bytes, curve25519 fixes the test failure05:31
shorneI mean alignment of curve25519-fiat32 by 8 bytes05:31
shorneBut, this is just the test that failing, it could be any part of code that is having this issue05:50
shorneusing the nop, sliding I can reproduce on the non-preempt kernel too07:07
shornebut still not sure what is going now, I am wondering if I can write a qemu plugin to detect any difference here07:08
shorneI guess something that capures the interrupts07:08
shornemaybe basic qemu tracing is enough07:09
shornelet me try to reproduce on the latest compiler too07:09
shorneok, reproduced on gcc 12 too07:12
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has quit IRC (Ping timeout: 240 seconds)09:35
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has joined #openrisc09:39
shorneok, this is possibly a qemu issue, I have booted the same kernel on qmeu and my fpga board and the fpga does not reproduce the issue12:41
shornezx2c4: ^^12:42
shornethe only thing I find so far is that12:42
shorne  1. the alignment of fe_mul_impl triggers the issue12:43
shorne  2. the 32-bit implementation of fe_mul_impl though looks like there are no branches in c code has many in assembly due to __muldi3, also somehow branches to skip multiplications12:45
shorneIll try to make a user-mode linux test of this curve25519 selftest to see if I can reproduce in a standalone test12:47
shornemaybe it wont work without the tick timer interrupting it12:47
zx2c4shorne: woah awesome detective work13:11
zx2c4wireguard-tools repo has a port of that code to userspace already if it helps13:11
zx2c4https://git.zx2c4.com/wireguard-tools/tree/src13:12
tpbTitle: src - wireguard-tools - Required tools for WireGuard, such as wg(8) and wg-quick(8) (at git.zx2c4.com)13:12
zx2c4And yea i was looking at the multidi implémentation the other day...13:12
zx2c4That code should be branchless in general13:13
zx2c4At least in terms of branching on input13:13
zx2c4To avoid sidechannel attacks13:13
zx2c4The reason i suspect multiplication since the beginning btw is because i also once saw it with the chacha20poly1305 test13:37
zx2c4And poly1305 does some multiplications13:37
zx2c4Faster and fewer ones, but nonetheless that's its essential operation13:38
zx2c4https://git.zx2c4.com/linux-rng/tree/lib/crypto/poly1305-donna32.c13:39
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has quit IRC (Ping timeout: 240 seconds)21:59
shornewow, I turned on qemu tracing (been playing with difference views) -D trace.txt -d int,exec,nochain"22:20
shornewith this one it runs really slow, but it reproduces the issue more often22:20
shornei.e. like 10+_ failures22:21

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!