*** tpb <[email protected]> has joined #openrisc | 00:00 | |
shorne | yeah, I think your test does way more io | 00:12 |
---|---|---|
shorne | zx2c4: so probably it points to something with the irq handling hanging (I fixed some bugs related to this) probably not all | 00:12 |
zx2c4 | when hung, cpu usage of the process was 0 | 00:12 |
zx2c4 | so probably mutex deadlock thing | 00:12 |
shorne | I did some kernel patches to try to detect IPI hangs (i.e. time start of ip vs time complete > 10ms), but that didn't reveal anything | 00:13 |
shorne | so its deadlocking somewhere else | 00:13 |
shorne | my guess is its still in qemu somewhere | 00:14 |
zx2c4 | yea | 00:15 |
shorne | zx2c4: I see you already got the patch about corrupt stack in there | 00:19 |
zx2c4 | shorne: no real stack corruption. just a "what if" thing linus wanted | 00:21 |
shorne | yeah, I didn't read the code, just the mail from linus and you. I thought it was the actual call stack, but this is just an array called stack | 00:22 |
shorne | well, I guess its on the stack | 00:22 |
shorne | [ 310.456000] reboot: Restarting system | 00:36 |
zx2c4 | shorne: wow it finished for you | 00:46 |
zx2c4 | mine did not | 00:46 |
shorne | [ 318.944000] reboot: Restarting system | 01:01 |
zx2c4 | shorne: i applied your patches on top of qemu master branch | 01:05 |
shorne | yours is on 5.19-rc8, this is the linux branch I am using right now: https://github.com/stffrdhrn/linux/commits/or1k-wireguard-2 | 01:05 |
zx2c4 | ah | 01:05 |
shorne | its just my or1k-5.20-updates + the wireguard selftests | 01:05 |
shorne | on qemu I am also running on top of qemu master basically: https://github.com/stffrdhrn/qemu/commits/or1k-virt-4 | 01:06 |
shorne | what run times do you typically see for the wireguard tests? | 01:08 |
shorne | like 100 seconds? | 01:08 |
zx2c4 | it hung before and i ctrl+c'd it after 10 minutes | 01:08 |
zx2c4 | oh, on other platforms? | 01:08 |
zx2c4 | lets see | 01:08 |
zx2c4 | https://www.wireguard.com/build-status/ | 01:08 |
tpb | Title: Build Status - WireGuard (at www.wireguard.com) | 01:08 |
shorne | I know host system matters, but just wondering what is the ballpark | 01:08 |
zx2c4 | x86 gives `[ 100.659072] reboot: machine restart` | 01:08 |
zx2c4 | (that's not TCG'd) | 01:08 |
shorne | [ 175.065665] reboot: Restarting system <-- arm | 01:09 |
shorne | [ 240.112282] reboot: Restarting system <-- riscv32 | 01:09 |
zx2c4 | these are concurrent runs though | 01:09 |
zx2c4 | lemme do one in isolation | 01:09 |
shorne | right, mine is in isolation | 01:09 |
shorne | interesting | 01:10 |
shorne | [ 459.399223] reboot: Restarting system <--- riscv32 net-next | 01:10 |
zx2c4 | likely due to net.git being pushed at the same time | 01:11 |
shorne | anyway I guess its just load on the server | 01:11 |
zx2c4 | huh, an or1k run just succeeded for me | 01:11 |
zx2c4 | (that previously failed) | 01:12 |
zx2c4 | so i guess there's some race | 01:12 |
shorne | yeah | 01:12 |
shorne | defintely some sort of race | 01:12 |
shorne | now my run is slow... [ 634.480000] wireguard: wg0: Interface created | 01:12 |
zx2c4 | oof | 01:13 |
shorne | successful, but | 01:13 |
shorne | [ 670.180000] reboot: Restarting system | 01:13 |
zx2c4 | i guess it escaped the deadlock somehow, eventually | 01:13 |
shorne | [+] NS0: ip link add dev wg0 type wireguard | 01:13 |
shorne | [ 114.432000] wireguard: wg0: Interface created | 01:13 |
shorne | [+] NS0: wg setconf wg0 /dev/fd/63 | 01:13 |
shorne | [ 560.792000] wireguard: wg0: Peer 14 created | 01:13 |
shorne | [+] NS0: wg show wg0 allowed-ips | 01:13 |
shorne | [+] NS0: ip link del wg0 | 01:14 |
shorne | [ 634.428000] wireguard: wg0: Peer 14 ((einval)) destroyed | 01:14 |
zx2c4 | yea thats where time gets killed for me too | 01:14 |
shorne | it usually had the delay around this point | 01:14 |
shorne | sometimes 1 minute | 01:14 |
shorne | sometims 5 | 01:14 |
zx2c4 | it's when generating a massive string in bash | 01:14 |
shorne | sometimes 10+ | 01:14 |
zx2c4 | which involves lots of memory usage and allocations | 01:14 |
shorne | I remember you mentioned that before, how were you able to observe that is bash allocating a string before? | 01:15 |
shorne | you just know thats what its doing there? | 01:15 |
zx2c4 | yea | 01:15 |
zx2c4 | ive starred at this test a lot :P | 01:15 |
zx2c4 | ARCH=arm in isolation: [ 125.600957] reboot: Restarting system | 01:15 |
shorne | right, so I should be able to get around 200 | 01:16 |
shorne | my compiler is not very well optimized compared to what they have invested in the compiler and instruction set in arm :) | 01:16 |
zx2c4 | https://www.irccloud.com/pastebin/xPjmUA25/ | 01:17 |
tpb | Title: Snippet | IRCCloud (at www.irccloud.com) | 01:17 |
zx2c4 | ARCH=riscv32 in isolation: [ 136.358275] reboot: Restarting system | 01:20 |
shorne | I see, that config it creates is big | 01:21 |
shorne | but not super huge | 01:24 |
shorne | ~2.4mb | 01:25 |
shorne | n0 wg setconf wg0 <(printf '%s\n' "${config[@]}" | 01:29 |
shorne | and this first one with 255x255 seems to be the one thats taking the most time | 01:30 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!