*** tpb <[email protected]> has joined #openrisc | 00:00 | |
shorne | I am just playing with the best benchmarks to measure tlb misses on qemu automatically | 00:00 |
---|---|---|
shorne | but now there is one issue cropping up, kicking /init sometimes takes long like 10 seconds, and sometimes 1 second | 00:01 |
shorne | it seems to flip-flop between fast/slow | 00:01 |
shorne | well, kind of random | 00:02 |
shorne | its one of those things I rather look at later, as the system is still pretty stable | 00:02 |
zx2c4 | hmmmm | 00:02 |
shorne | but its annoying to have to wait 10 seconds for init to start | 00:02 |
zx2c4 | alright, well, i sent the patch | 00:02 |
zx2c4 | yea that is quite odd. i've seen it too | 00:02 |
shorne | thanks | 00:02 |
zx2c4 | the big test will be whether build.wireguard.com gets slowed down waiting for it or not | 00:02 |
shorne | so I am thinking phase 1 get stability patches out | 00:03 |
zx2c4 | (please don't modify the commit subject; wireguard commits are al uniform and i'd like to keep them that way) | 00:03 |
shorne | phase 2 get performance patches out, and try to fix this 10 second hank | 00:03 |
zx2c4 | yea. also also - once this is in tree, build.wireguard.com will be rebuilding and rerunning for every single commit pushed to a bunch of trees | 00:03 |
zx2c4 | which means it'll be a good way to find new bugs and crashes and stuff | 00:03 |
shorne | I usually don't modify subjects, so no worries | 00:04 |
zx2c4 | i guess i'll have to patch QEMU on the CI server | 00:04 |
zx2c4 | alright, qemu on CI server patched | 00:10 |
zx2c4 | so if you send your PR to linus in the next few hours i guess that's good timing for pacific coast for him | 00:11 |
zx2c4 | and then this will be churning away by morning | 00:11 |
zx2c4 | okay it's churning way on the CI server now | 00:19 |
shorne | I think I need my patches to be reviewed | 00:23 |
shorne | sorry, I am a bit slow :( | 00:23 |
shorne | basically its a pci support and irqchip patch | 00:24 |
shorne | probably PCI is a bit too big for the -rc series | 00:25 |
shorne | but also PCI is not needed probably for the wireguard CI | 00:26 |
zx2c4 | wgci uses mmio yea | 00:26 |
zx2c4 | hmm no console output? wonder what's up here | 00:27 |
zx2c4 | did i forget some patch | 00:27 |
zx2c4 | oh hah im dumb | 00:51 |
zx2c4 | it doesnt run because the other kernel pieces aren't there :-P | 00:51 |
zx2c4 | duh | 00:51 |
zx2c4 | alright ill just be patient and wait for you to submit it | 00:52 |
shorne | I am testing a 5.19-fixes branch now | 00:58 |
shorne | with hopefully just the basics of what we need | 00:58 |
shorne | oh, its running good so far wireguard tests are running | 00:59 |
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has joined #openrisc | 01:12 | |
shorne | zx2c4: ok, I have this queue ready to go: https://github.com/openrisc/linux/tree/or1k-5.19-fixes | 01:30 |
shorne | wireguard test pass | 01:30 |
shorne | waiting for | 01:30 |
shorne | 1. irq patch to get acked | 01:30 |
shorne | 2. ill push this to the next branch too for some more 'global' testing | 01:31 |
*** littlebobeep <littlebobeep!~alMalsamo@gateway/tor-sasl/almalsamo> has quit IRC (Quit: leaving) | 07:25 | |
shorne | ok, got ack on the IRQ patch | 08:32 |
shorne | its in linux next right now | 08:32 |
shorne | so likely Ill send to linux tomorrow AM | 08:33 |
shorne | to linus | 08:33 |
zx2c4 | shorne: great! Sounds like you got your ack! | 10:06 |
zx2c4 | shorne: i just pushed things to the CI server with the irqchip fix in there and it's broken - https://xn--4db.cc/D5dbnrvy | 12:27 |
zx2c4 | this is running your branch of qemu | 12:27 |
zx2c4 | with irqchip patch | 12:27 |
zx2c4 | for kernel | 12:27 |
zx2c4 | it failed like this after quite a bit of work done | 12:27 |
zx2c4 | indicating that this is a hard to hit bug that only happened under load... | 12:27 |
zx2c4 | here's the whole log https://א.cc/x3Vt402T | 12:28 |
zx2c4 | ooo it keeps splatting | 12:30 |
zx2c4 | https://xn--4db.cc/bKiNzmFE | 12:30 |
zx2c4 | shorne: what this indicates is there's probably still some kind of race | 12:32 |
zx2c4 | or ordering issue | 12:32 |
zx2c4 | because, if it works on a laptop, but fails when it's being run alongside 18 other concurrent tests on other archs, | 12:33 |
zx2c4 | then the differentiating factor is that the qemu processes are constantly being scheduled out and not allowed to run for very long, because theyre all fighting for a CPU | 12:33 |
zx2c4 | which in turn means the or1k cpus execute in a different order than usual and with longer delays than usual | 12:34 |
zx2c4 | shorne: okay eventually it recovered? but too slowly, so the thing still failed https://build.wireguard.com/wireguard-linux-stable/a913f377cf1dbe90786e99ca3661e57a382c4541/or1k.log | 12:51 |
zx2c4 | So yea these hangs or whatever it is needs to be fixed before this is ready | 12:54 |
*** Finde_ <[email protected]> has joined #openrisc | 13:06 | |
*** Finde <[email protected]> has quit IRC (Read error: Connection reset by peer) | 13:06 | |
zx2c4 | shorne: can you drop the wireguard patch for now? | 13:09 |
zx2c4 | Seems a bit premature | 13:10 |
*** Finde_ is now known as Finde | 14:34 | |
*** Finde <[email protected]> has quit IRC (Quit: WeeChat 2.3) | 14:35 | |
*** Finde <[email protected]> has joined #openrisc | 14:35 | |
*** arnd <[email protected]> has quit IRC (*.net *.split) | 20:01 | |
*** shorne <[email protected]> has quit IRC (*.net *.split) | 20:01 | |
*** Finde <[email protected]> has quit IRC (*.net *.split) | 20:01 | |
*** zx2c4 <zx2c4!sid204921@gentoo/developer/zx2c4> has quit IRC (*.net *.split) | 20:01 | |
*** jcm <jcm!sid410222@2a03:5180:f::6:426e> has quit IRC (*.net *.split) | 20:01 | |
*** knz <knz!~kena@2001:41d0:a:f6e9::1> has quit IRC (*.net *.split) | 20:01 | |
*** Finde <[email protected]> has joined #openrisc | 20:15 | |
*** arnd <[email protected]> has joined #openrisc | 20:15 | |
*** shorne <[email protected]> has joined #openrisc | 20:15 | |
*** jcm <jcm!sid410222@2a03:5180:f::6:426e> has joined #openrisc | 20:15 | |
*** knz <knz!~kena@2001:41d0:a:f6e9::1> has joined #openrisc | 20:15 | |
*** zx2c4 <zx2c4!sid204921@gentoo/developer/zx2c4> has joined #openrisc | 20:15 | |
shorne | zx2c4: understood, let me have a look, its good to get these sorted out, maybe its related to that 10 second pause issue | 20:38 |
shorne | I still did see some rcu timeouts myself before, but it did recover and the test passed | 20:40 |
zx2c4 | shorne: in this case the test failed because it took >20min | 20:56 |
shorne | zx2c4: yeah, it looks like the lockups are when waiting for IPI requests to complete. i.e. kernel sends a task to be done on multiple CPU's then some CPU's don't respond back | 21:04 |
shorne | this issue I put some fixes for but there could still be race conditions in qemu that I am not covering | 21:04 |
shorne | its better to track these down I agree | 21:04 |
shorne | I would like to get most of the stuff for qemu merged upstream now though | 21:05 |
zx2c4 | Multiple pull requests seems reasonable | 21:05 |
zx2c4 | One now, one later | 21:05 |
shorne | yeah, also maybe someone will point out the issue in the PR review :) | 21:06 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!