Monday, 2019-11-04

*** tpb has joined #litex		00:00
*** freemint has quit IRC		00:14
*** freemint has joined #litex		00:56
*** freeemint has joined #litex		02:22
*** freemint has quit IRC		02:23
*** CarlFK has quit IRC		02:44
*** CarlFK has joined #litex		03:28
*** CarlFK has quit IRC		05:02
*** freeemint has quit IRC		05:41
*** CarlFK has joined #litex		08:14
*** freeemint has joined #litex		09:53
*** scanakci has quit IRC		10:19
*** freeemint has quit IRC		10:43
*** freeemint has joined #litex		10:47
*** freeemint has quit IRC		11:06
*** freeemint has joined #litex		11:11
somlo	_florent_, daveshah: over the weekend I ran a quick experiment on how performance would be affected by the data width of a future direct link between Rocket's cached-ram 'mem-axi' port and LiteDRAM: https://github.com/enjoy-digital/litex/issues/299	15:35
daveshah	somlo: very interesting	15:40
daveshah	impressed how big the linpack difference is	15:40
somlo	should more or less match the "integer" performance portion of NBench	15:41
somlo	since fp is emulated in bbl (can't fit a real FPU on ecp5)	15:41
somlo	so, in conclusion, I'm tempted to add a new Rocket variant to litex, one with a 256-bit mem_axi port, which would access main-ram in bursts of 2 x 256bit beats, instead of 8 x 64bit, and which would perfectly match the trellisboard	15:49
daveshah	2 beats seems fairly low for ddr3	15:49
somlo	during early experimentation I noticed that over the 64bit mem_axi there's always 8 accesses at a time, which tells me the cache line is 512 bits	15:51
somlo	* L1 cache line (only cache there is on Rocket, atm)	15:51
daveshah	oh nvm, I realise that that's 2 system clock cycles	15:54
daveshah	which is 8 memory cycles which is the burst length of the dram	15:54
daveshah	all makes sense	15:54
somlo	oh, was this an "axi burst" vs. "ddr3 burst" terminology overload thing ? :)	15:55
daveshah	yeah	15:55
somlo	So given that Rocket has 512-bit cache lines internally, my instincts tell me it'll always be more efficient to dump as much of it out per system clock cycle as LiteDRAM is capable of accepting, as opposed to chopping it into smaller slices internally, only to have some data-width conversion RTL reassemble it into 256-bit slices downstream, before passing it to LiteDRAM	15:59
somlo	admittedly, the performance delta between my 128-bit measurements and the 256-bit ones is less spectacular than I would have expected :)	16:00
daveshah	Guess it is possible that memory just isn't the bottleneck for those	16:01
somlo	but then I also think having some L1 cache alleviates that difference, and if I rebuilt everything with less (or no) L1 cache on the rocket side, I'd see more of an impact	16:01
daveshah	Rocket isn't awfully high IPC anyway	16:01
daveshah	would be much bigger than an ecp5, but might be interesting to repeat the experiment with boom...	16:03
somlo	_florent_, daveshah: LiteX code I used in the experiment is here: https://github.com/enjoy-digital/litex/pull/300	16:15
tpb	Title: RFC: Direct link between Rocket/mem_axi <--> LiteDRAM dataport by gsomlo · Pull Request #300 · enjoy-digital/litex · GitHub (at github.com)	16:15
somlo	(minus the additional Rocket variants themselves, which are simply more pre-built verilog variants in rocket-litex-verilog, with additional width options for axi_mem)	16:16
_florent_	somlo: thanks interesting, i'll have a closer look at it a bit later	16:30
*** scanakci has joined #litex		17:17
*** freeemint has quit IRC		18:24
*** freeemint has joined #litex		18:24
*** ambro718 has joined #litex		18:56
*** freeemint has quit IRC		20:01
*** freeemint has joined #litex		21:17
*** freeemint has quit IRC		22:29
*** freeemint has joined #litex		22:32
*** freeemint has quit IRC		22:42
*** ambro718 has quit IRC		23:36

Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!