Monday, 2019-11-04

*** tpb has joined #litex00:00
*** freemint has quit IRC00:14
*** freemint has joined #litex00:56
*** freeemint has joined #litex02:22
*** freemint has quit IRC02:23
*** CarlFK has quit IRC02:44
*** CarlFK has joined #litex03:28
*** CarlFK has quit IRC05:02
*** freeemint has quit IRC05:41
*** CarlFK has joined #litex08:14
*** freeemint has joined #litex09:53
*** scanakci has quit IRC10:19
*** freeemint has quit IRC10:43
*** freeemint has joined #litex10:47
*** freeemint has quit IRC11:06
*** freeemint has joined #litex11:11
somlo_florent_, daveshah: over the weekend I ran a quick experiment on how performance would be affected by the data width of a future direct link between Rocket's cached-ram 'mem-axi' port and LiteDRAM: https://github.com/enjoy-digital/litex/issues/29915:35
daveshahsomlo: very interesting15:40
daveshahimpressed how big the linpack difference is15:40
somloshould more or less match the "integer" performance portion of NBench15:41
somlosince fp is emulated in bbl (can't fit a real FPU on ecp5)15:41
somloso, in conclusion, I'm tempted to add a new Rocket variant to litex, one with a 256-bit mem_axi port, which would access main-ram in bursts of 2 x 256bit beats, instead of 8 x 64bit, and which would perfectly match the trellisboard15:49
daveshah2 beats seems fairly low for ddr315:49
somloduring early experimentation I noticed that over the 64bit mem_axi there's always 8 accesses at a time, which tells me the cache line is 512 bits15:51
somlo* L1 cache line (only cache there is on Rocket, atm)15:51
daveshahoh nvm, I realise that that's 2 system clock cycles15:54
daveshahwhich is 8 memory cycles which is the burst length of the dram15:54
daveshahall makes sense15:54
somlooh, was this an "axi burst" vs. "ddr3 burst" terminology overload thing ? :)15:55
daveshahyeah15:55
somloSo given that Rocket has 512-bit cache lines internally, my instincts tell me it'll always be more efficient to dump as much of it out per system clock cycle as LiteDRAM is capable of accepting, as opposed to chopping it into smaller slices internally, only to have some data-width conversion RTL reassemble it into 256-bit slices downstream, before passing it to LiteDRAM15:59
somloadmittedly, the performance delta between my 128-bit measurements and the 256-bit ones is less spectacular than I would have expected :)16:00
daveshahGuess it is possible that memory just isn't the bottleneck for those16:01
somlobut then I also think having *some* L1 cache alleviates that difference, and if I rebuilt everything with less (or no) L1 cache on the rocket side, I'd see more of an impact16:01
daveshahRocket isn't awfully high IPC anyway16:01
daveshahwould be much  bigger than an ecp5, but might be interesting to repeat the experiment with boom...16:03
somlo_florent_, daveshah: LiteX code I used in the experiment is here: https://github.com/enjoy-digital/litex/pull/30016:15
tpbTitle: RFC: Direct link between Rocket/mem_axi <--> LiteDRAM dataport by gsomlo · Pull Request #300 · enjoy-digital/litex · GitHub (at github.com)16:15
somlo(minus the additional Rocket variants themselves, which are simply more pre-built verilog variants in rocket-litex-verilog, with additional width options for axi_mem)16:16
_florent_somlo: thanks interesting, i'll have a closer look at it a bit later16:30
*** scanakci has joined #litex17:17
*** freeemint has quit IRC18:24
*** freeemint has joined #litex18:24
*** ambro718 has joined #litex18:56
*** freeemint has quit IRC20:01
*** freeemint has joined #litex21:17
*** freeemint has quit IRC22:29
*** freeemint has joined #litex22:32
*** freeemint has quit IRC22:42
*** ambro718 has quit IRC23:36

Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!