*** tpb <[email protected]> has joined #openrisc | 00:00 | |
zx2c4 | shorne: how big are openrisc cache lines? | 00:48 |
---|---|---|
zx2c4 | i just noticed that _immu_trampoline is aligned to 64 bytes but a comment lower on says it should be aligned to 4*cache_size | 00:49 |
zx2c4 | .align 64 | 00:50 |
zx2c4 | _immu_trampoline: | 00:50 |
zx2c4 | ... | 00:50 |
zx2c4 | // immu_trampoline is (4x) CACHE_LINE aligned | 00:50 |
shorne | based on cpu its 16-bytes or 32-bytes | 02:53 |
shorne | I found if we have the failure condition, adding 1-15 nops will fix it , but adding 16 nops the issue will trigger again. So its eems something to do with 4*16, 64-byte offsets | 02:55 |
shorne | so it might be related to this | 02:55 |
shorne | but in qemu caches are disabled | 02:55 |
shorne | I was thinking about System.map, and playing with something. If I add l.nop's in functions higher in address space the issue persists | 05:07 |
shorne | If I add nop's in functions lower in the map, the issue is fixed | 05:07 |
shorne | So I did a kind of binary search | 05:07 |
shorne | unfortunately I ended up in lib/crypto/curve25519-fiat32.c | 05:08 |
shorne | Interesting, adding this: __attribute__((optimize("align-functions=8"))) to that file fixes the issue | 05:24 |
shorne | this is the only diff: https://gist.github.com/04ca31ee4c512bd87fc2dc5b6298fbd2 | 05:24 |
shorne | it makes almost no sense, those extra jumps (l.j) are actually just x0000000 padding the end of the function | 05:30 |
shorne | so no binary changes other than alignment fixes by 8 bytes, curve25519 fixes the test failure | 05:31 |
shorne | I mean alignment of curve25519-fiat32 by 8 bytes | 05:31 |
shorne | But, this is just the test that failing, it could be any part of code that is having this issue | 05:50 |
shorne | using the nop, sliding I can reproduce on the non-preempt kernel too | 07:07 |
shorne | but still not sure what is going now, I am wondering if I can write a qemu plugin to detect any difference here | 07:08 |
shorne | I guess something that capures the interrupts | 07:08 |
shorne | maybe basic qemu tracing is enough | 07:09 |
shorne | let me try to reproduce on the latest compiler too | 07:09 |
shorne | ok, reproduced on gcc 12 too | 07:12 |
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has quit IRC (Ping timeout: 240 seconds) | 09:35 | |
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has joined #openrisc | 09:39 | |
shorne | ok, this is possibly a qemu issue, I have booted the same kernel on qmeu and my fpga board and the fpga does not reproduce the issue | 12:41 |
shorne | zx2c4: ^^ | 12:42 |
shorne | the only thing I find so far is that | 12:42 |
shorne | 1. the alignment of fe_mul_impl triggers the issue | 12:43 |
shorne | 2. the 32-bit implementation of fe_mul_impl though looks like there are no branches in c code has many in assembly due to __muldi3, also somehow branches to skip multiplications | 12:45 |
shorne | Ill try to make a user-mode linux test of this curve25519 selftest to see if I can reproduce in a standalone test | 12:47 |
shorne | maybe it wont work without the tick timer interrupting it | 12:47 |
zx2c4 | shorne: woah awesome detective work | 13:11 |
zx2c4 | wireguard-tools repo has a port of that code to userspace already if it helps | 13:11 |
zx2c4 | https://git.zx2c4.com/wireguard-tools/tree/src | 13:12 |
tpb | Title: src - wireguard-tools - Required tools for WireGuard, such as wg(8) and wg-quick(8) (at git.zx2c4.com) | 13:12 |
zx2c4 | And yea i was looking at the multidi implémentation the other day... | 13:12 |
zx2c4 | That code should be branchless in general | 13:13 |
zx2c4 | At least in terms of branching on input | 13:13 |
zx2c4 | To avoid sidechannel attacks | 13:13 |
zx2c4 | The reason i suspect multiplication since the beginning btw is because i also once saw it with the chacha20poly1305 test | 13:37 |
zx2c4 | And poly1305 does some multiplications | 13:37 |
zx2c4 | Faster and fewer ones, but nonetheless that's its essential operation | 13:38 |
zx2c4 | https://git.zx2c4.com/linux-rng/tree/lib/crypto/poly1305-donna32.c | 13:39 |
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has quit IRC (Ping timeout: 240 seconds) | 21:59 | |
shorne | wow, I turned on qemu tracing (been playing with difference views) -D trace.txt -d int,exec,nochain" | 22:20 |
shorne | with this one it runs really slow, but it reproduces the issue more often | 22:20 |
shorne | i.e. like 10+_ failures | 22:21 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!