Thursday, 2021-01-28

*** tpb has joined #litex00:00
*** lf has quit IRC00:43
*** lf_ has joined #litex00:43
*** Bertl_oO has quit IRC00:49
*** peeps[zen] has joined #litex00:59
*** peepsalot has quit IRC01:00
*** peeps[zen] is now known as peepsalot01:00
*** st-gourichon-f has joined #litex01:15
*** st-gourichon-fid has quit IRC01:16
*** tpb has joined #litex02:21
*** carlomaragno has quit IRC02:21
*** carlomaragno has joined #litex02:22
*** Bertl_oO has joined #litex02:52
*** lkcl has quit IRC03:21
*** Degi_ has joined #litex03:43
*** lkcl has joined #litex03:44
*** Degi has quit IRC03:44
*** Degi_ is now known as Degi03:44
*** peepsalot has quit IRC05:02
*** peeps has joined #litex05:05
*** peepsalot has joined #litex05:13
*** peeps has quit IRC05:15
*** kgugala has quit IRC06:20
*** kgugala has joined #litex06:21
*** Bertl_oO is now known as Bertl_zZ06:40
*** lkcl has quit IRC07:27
*** kgugala has quit IRC07:31
*** kgugala has joined #litex07:31
*** lkcl has joined #litex07:39
*** kgugala_ has joined #litex07:48
*** kgugala has quit IRC07:52
_florent_acathla: zyp's explanations make sense to me, if you want to ease investigation on this, you could also use litex_sim with --trace. This would give you direct visibility on all signals of the SoC08:20
_florent_Also not your you have an instruction cache on your CPU, but if not adding one (even small) could help08:20
acathla_florent_, I use vexriscv-minimal from LiteX, I don't know if there is an instruction cache.08:22
_florent_acathla: Litescope is nice for things that are difficult to simulate but here I would look at this in simulation08:24
_florent_VexRiscv-minimal has the caches disabled IIRC08:24
acathlaI understand but I don't well how it works yet. I don't how to add something to the simulation for example08:24
acathlaunless it's available as an option08:25
_florent_acathla: litex_sim simulation is really similar to the target you are running on hardware, it's possible to customize it, add your own peripherals, etc...08:26
_florent_I can try to get you started on this, I'm going to try to reproduce the behaviour you see in simulation08:27
acathlaOh, nice :)08:28
acathlaI'm not sure it's critical on the versa, as zyp said, but on the fomu it get worse and the clock is slower so we end up with a 32 microcontroller slower than everything08:29
_florent_ok, so with a minor modification to  the BIOS to reproduce your behavior:08:37
_florent_https://www.irccloud.com/pastebin/tEo539RX/08:37
tpbTitle: Snippet | IRCCloud (at www.irccloud.com)08:37
_florent_rm -rf build && lxsim --cpu-type=vexriscv --cpu-variant=minimal --trace08:38
_florent_gtkwave build/sim/gateware/sim.vcd.08:38
_florent_you'll get this:08:39
_florent_https://usercontent.irccloud-cdn.com/file/Ntaxggr1/Screenshot%20from%202021-01-28%2009-39-14.png08:39
_florent_which seems to exhibit the same behavior you are seeing on hardware, except that it takes a few seconds to reproduce :)08:40
_florent_no let's try with others variant of VexRiscv08:40
_florent_rm -rf build && lxsim --cpu-type=vexriscv --cpu-variant=lite --trace08:41
_florent_the behaviour is now different:08:44
_florent_https://usercontent.irccloud-cdn.com/file/jx2VROpm/Screenshot%20from%202021-01-28%2009-44-24.png08:44
acathlaHum, I cannot decode the matrix that easily yet...08:47
_florent_and also different with standard: rm -rf build && lxsim --cpu-type=vexriscv --cpu-variant=standard --trace:08:47
_florent_https://usercontent.irccloud-cdn.com/file/GdUWqgXb/Screenshot%20from%202021-01-28%2009-47-17.png08:47
_florent_where we no longer see instruction bus accesses during the while(1) puts("h") loop since instructions are in the cache08:48
acathlaBut the standard vex cannot fit in an iCE40up5k, right?08:50
_florent_acathla: sure that's not easy to decode when not familiar with it, but it was just to show you that the simulation could be useful for this work and to understand the performance issue and how to improve/avoid it08:51
*** kgugala has joined #litex08:51
acathlaOk, thank you.08:51
*** kgugala_ has quit IRC08:53
acathlaOh, standard vexriscv fits in a fomu without USB08:53
acathlaSince I don't need USB but Infrared, may be it can all fit.08:54
*** kgugala_ has joined #litex08:55
_florent_if you want to see nice optimizations on iCE40 with VexRiscv and a specific cache using the SPRAM, I would recommend looking at https://twitter.com/esden/status/1354568108510388232 :)08:55
acathlaomg Doom with sound!08:56
*** kgugala has quit IRC08:58
acathla_florent_, so you jsut said that for nice optimizations I should forget about litex and go verilog...09:26
acathla=)09:27
_florent_acathla: that's not necessarily what I say :), but LiteX covers verious various cases and has to be generic enough to do so,  It should provide a good basis and trade off between genericity/performance, but if you really want to push performance, the first thing is to understand the real bottleneck and do maybe some optimizations09:31
acathlaThat's a lot of work...09:32
_florent_what is a lot of work?09:33
acathlaSPI RAM seems to be a nice idea09:33
acathla_florent_, understand every level to optimize09:33
acathlaand my job is to make a robot with IR communication, from scratch, better than the kilobot.09:34
_florent_ah not necessarily every level, but at least the bottleneck for your use case09:34
_florent_If LiteX provides you 90% of what you need and you just need to plug a custom module (specific cache, SPI RAM), that can still be interesting compared to verilog, but that's sure that it relies on the same principles than others SoCs and has the same limitations09:38
zypcustom interconnect, perhaps09:40
_florent_I was sharing the Doom design on iCEBreaker since find it a good example of what you can do in a constrainted environment with things optimized as much as it can possible be09:40
zypyou'd probably want something more efficient than a plain shared interconnect, but smaller than a full crossbar09:41
acathlaI'm not sure I could explain what's a shared interconnect or a full crossbar09:45
zyp_florent_, maybe a general reduced crossbar would be useful? I figure in a small system the only masters would be I-bus and D-bus and the I-bus would probably only need to access one or two of the slaves09:45
zypacathla, the interconnect is what connect wishbone masters to wishbone slaves -- a shared interconnect can be accessed by one master at a time09:46
zypso there's an arbiter to select which master to serve followed by an address decoder to select which slave to access09:48
acathlausually there is only one master, right? Unless we add a bridge to debug things.09:48
_florent_zyp: yes that indeed be useful09:48
zypacathla, no, the cpu alone has two; ibus and dbus09:48
acathlato be sure : does that mean instruction-bus and data-bus?09:49
zypacathla, part of what you're seeing is that when the dbus wants to access CSR, it has to wait because the interconnect is busy by the ibus accessing code09:49
zypyes09:49
acathlaCan't we make the SPRAM accessible only by the CPU so he doesn't have to wait?09:50
zypa crossbar is a larger interconnect where each master has its own address decoder and each slave has its own arbiter so that masters only have to wait if multiple are trying to access the same slave at a time09:50
zypso it gets a lot bigger than a shared interconnect09:51
zypand a reduced crossbar is an optimization where each slave can only be used by the masters that actually need it09:52
acathlaOk.09:53
zypthe dbus generally need to be able to access everything, because it might need to fetch data that's embedded in the code09:54
acathlaFor dbus, I understand, but ibus09:54
zypbut in your case, the ibus probably doesn't need to access anything other than spram09:54
acathlaI agree (and understand, yay!)09:55
zypi.e. you could put a decoder on the dbus to let it access everything, and then an arbiter only in front of the spram, to let it be accessed either by the ibus directly, or by the dbus decoder09:56
zypthat way there would only be slowdowns when the dbus needs to access the spram09:57
acathlaI understand but not sure I can do that. It already took me hours to try to understand how the UART works09:58
acathlaI guess we need to modify the CPU itself as it has only a wishbone interface (well, reconfigure it as it is generated)10:09
acathlaor the wishbone interface is okay10:10
_florent_if you are using VexRiscv, it already has separate ibus/dbus10:10
*** lkcl has quit IRC10:18
*** lkcl has joined #litex10:31
*** Zguig has joined #litex10:44
_florent_acathla: here is a quick test to add a direct connection between the ibus of VexRiscv and a peripheral (here the ROM):10:55
_florent_https://www.irccloud.com/pastebin/ISAUB91o/10:55
tpbTitle: Snippet | IRCCloud (at www.irccloud.com)10:55
_florent_you could also extend it to have access to others peripherals10:56
_florent_this removes the ibus/dbus bottleneck you were seeing with vexriscv minimal:10:57
_florent_https://usercontent.irccloud-cdn.com/file/IYbCEht9/Screenshot%20from%202021-01-28%2011-56-52.png10:58
leons_florent_: do you happen to accept Git patches to LiteX via email or prefer GitHub PRs?10:59
acathla_florent_, thank you. First time I see pop or Arbiter...11:06
acathlaCan I simply replace rom with spram?11:07
*** Zguig has quit IRC11:08
zyprom is implemented as spram, I'd guess11:09
acathlaIt's in SPI flash by default11:16
acathlabut the code is moved to ram at start11:16
acathla_florent_, how could it work since the ibus is not connected anymore to the ram?11:25
acathlaoh, probably no instruction goes into ram in the sim11:25
acathlawhy do you put in masters (for the Arbiter) the sram itself?11:31
zypacathla, rom starts out being connected to the shared interconnect, line 37 in the patch disconnects it and reconnects it with the new Arbiter in between11:36
zypacathla, https://bit.ly/2NF6I0R <- here's a quick illustration11:52
tpbTitle: Graphviz Online (at bit.ly)11:52
futarisIRCcloudhttps://youtu.be/3ZBAZ5QoCAk12:08
*** kgugala_ has quit IRC12:18
*** kgugala has joined #litex12:18
_florent_acathla: yes you can also add the spram to the arbiter12:20
keesjnice video12:21
_florent_leons: I have a preference for PRs but also accept patches via email12:21
leons_florent_: That's great to hear! I've already started a PR for this one, but might resort to patches in the future since they tend to better integrate with my workflow and I don't have to use GitHub then :)12:24
*** FFY00 has quit IRC12:55
*** Bertl_zZ is now known as Bertl13:12
acathlazyp, thank you for the illustration.13:30
acathlaI added two Arbiter as it seems logic like this, one for the ROM(spiflash) and one for the spram, but it does not seem to work.13:31
*** lkcl has quit IRC13:46
*** lkcl has joined #litex13:47
_florent_acathla: ah sorry, the Arbiter will not work with two slaves, you'll need to use the InterconnectShared, 2s13:49
_florent_something like this should work:13:51
_florent_https://www.irccloud.com/pastebin/UCCQdBql/13:51
tpbTitle: Snippet | IRCCloud (at www.irccloud.com)13:51
zypseems a bit roundabout to have two slave ports on the main interconnect hook up to two master ports on the second interconnect :)13:52
acathla_florent_, it boots!14:07
_florent_zyp: indeed, that's just a quick workaround until we could support it natively :)14:11
acathlaAnd it finally works! The code was able to fill the UART FIFO so I could send a full frame instead of separated bytes !14:16
acathlathank you _florent_ & zyp14:16
_florent_great14:17
futarisIRCcloudhttps://bostonarch.github.io/2021/14:18
tpbTitle: BARC 2021 (at bostonarch.github.io)14:18
*** Zguig has joined #litex14:39
ZguigHi _florent_, just did a commit related to Linux-vexrisc/ECPIX-5 Board and saw that now there is a L2 parameter defined to 2048. This is making the boot and board much slower at boot. Did some tests with and without this paremeters: 4 secs without it and here are the numbers: 72 secs until random: dd message with VS 17 secs without this parameter.14:42
ZguigIs it something normal and expected?14:42
_florent_Zguig: I added this to fix issues with the OrangeCrab and boards that don't support DMs14:46
_florent_this should impact performance a bit, but not that much14:46
_florent_I'm going to do a test14:46
ZguigI thought I had broken everything first, but after being patient it boots until the end. Did a test with same code and only commenting the parameter for the board and everything is much faster14:51
*** SpaceCoaster_ has joined #litex15:30
*** SpaceCoaster has quit IRC15:30
_florent_Zguig: I just reverted the ECPIX5 to use direct LiteDRAM interface, with the L2 cache I get [   23.603248] random, with the direct LiteDRAM interface: [    8.715306] random15:51
*** FFY00 has joined #litex16:10
*** rohitksingh has quit IRC16:36
*** rohitksingh has joined #litex16:37
*** alanvgreen has quit IRC16:37
*** alanvgreen has joined #litex16:37
*** Zguig has quit IRC16:53
*** kgugala has quit IRC17:07
*** kgugala has joined #litex17:07
somlo_florent_: commit 2287f739 is very interesting, am I really able to read/write CSRs over jtag while Linux is running on my rocket/litex rig?17:57
somloand to be precise, the "address" is realative to the soc bus start, so basically an offset as far as the cpu's view of a full MMIO address would be17:58
*** FFY00 has quit IRC18:15
*** FFY00 has joined #litex18:18
somlo_florent_: so I tried using my jtag_bone enabled rocket SoC (on nexys4ddr), with the litex_server (same setup that works with litescope_cli)18:21
somloand whatever register I'm reading always returns `0xc3bfc3bf`18:21
somlowhether I write anything (else) to it beforehand or not :)18:22
*** ranzbak has quit IRC18:39
_florent_somlo: litex_cli --read/--write just provide a simple way to do accesses to the bus of the SoC (with no address translation)19:01
_florent_somlo: if litescope_cli works, this should also works19:02
_florent_with the csr.csv of the SoC in the same  directory, you can try litex_cli --regs, this will make a dump of all the available  registers19:02
somloaha, got it, the address *is* absolute, as shown by `--regs` output19:05
somlobut weirdly, if I write `litex_client --write 0x12000004 0x12345678` then `litex_client --read 0x12000004` will return what I wrote; but if I write something else, e.g. 0x123456ff, I read back 0x123456c319:08
somlothere's some bit masking going on at least, if not something worse...19:08
acathlasomlo, you must write in a register where you can write, or in RAM where a program is not also writing19:09
somloI'm writing to the scratch register, which the linux driver only accesses once during boot, then leaves alone from that point on19:10
somlo0x12000004 is the scratch CSR on my litex/rocket SoC19:11
acathlaOk. You can try to write to RAM, in the middle.19:12
*** ranzbak has joined #litex19:13
somlonot sure how reading ram would work on rocket (ram is connected to a dedicated point-to-point axi interface on the rocket chip, not shared on the same bus where the CSRs are located). So I wanted to start with baby steps -- the scratch CSR, like in the commit log example :)19:13
acathlaWhy do people use AXI?19:15
*** Bertl is now known as Bertl_oO19:16
somloacathla: I use it because Rocket exposes it as its interface with the outside world :)19:19
somlothe actual pro / con between axi and wishbone is a whole different topic :)19:19
somloanyway, I prevented the thing from booting into linux, got it at the bios prompt19:33
somlowriting 0xffff to the scratch register reads back 0xc3bf, but it's not a straightforward bit mask that somehow luckily avoids affecting 0x12345678, but something a bit weirder than that...19:34
somlo_florent_: not sure it's only specific to rocket, haven't tried a different cpu / memory map / bus/memory interconnect scheme19:35
_florent_somlo: strange, this would need to be investigated... A good use case for Litescope now that you are familiar with it :)20:18
_florent_I could look at it tomorrow otherwise20:18
somlo_florent_: not sure how litescope would help, but I tried `mem_read` and `mem_write` from the litex bios prompt20:42
somloinitially, read: "0x12000004  78 56 34 12" (from 0x12000004)20:43
somlowrote 0x0000ffff, read back "0x12000004  ff ff 00 00" -- so it seems to work fine. It's just through the client via server and jtag interface that it turns out weird20:44
somlo_florent_: writing via litex_client -> server -> jtag, I can `mem_read` the right values from the bios prompt, so writes work via jtag20:47
somloreading back (via jtag) is what's getting messed up20:47
_florent_somlo: ok, could you try lowering  adapter_khz in the openocd config file (in prog/openocd_xc7_ftxy.cfg)20:48
somloit's 2500 now, what's a good test value?20:49
somlo15000?20:49
somloha20:49
somlonah, for a second I thought I'd gotten it, but nope, writes work, reads are erratic and mostly non-sensical, even with as low as 500020:51
somloso I tried the default 25000, then 15000, then 5000, then 500 - and same result, writing works (can confirm by `mem_read` on bios prompt; reading back is all over the place, now I'm getting 0xc3bfc39e (there's that c3bf pattern again)20:55
_florent_ok, I started really using the jtagbone today but haven't seen this behaviour, I'll continue more testing tomorrow21:03
*** feldim2425_ has joined #litex23:14
*** feldim2425 has quit IRC23:14
*** feldim2425_ is now known as feldim242523:15
*** Claude has quit IRC23:21
*** vup has quit IRC23:31
*** vup has joined #litex23:32
*** y2kbugger has quit IRC23:36
*** esden has quit IRC23:36
*** y2kbugger has joined #litex23:37
*** esden has joined #litex23:37
*** tannewt has quit IRC23:37
*** tannewt has joined #litex23:39

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!