Monday, 2019-06-03

keesjinteresting problem07:46
ambro718Is it correct that CSRs can be 32-bit but it will not provide atomic access but 4 8-bit accesses?18:50
tpbTitle: Snippet | IRCCloud (at
daveshahHave the TrellisBoard ( working with LiteX VexRiscv Linux19:12
tpbTitle: GitHub - daveshah1/TrellisBoard: Ultimate ECP5 development board (at
daveshahHad to do various hacks to get 1GB of RAM working (including shuffling address about and modifying the address decoding)19:12
daveshahWill try and submit something a bit neater in the next few days19:12
_florent__ambro718: yes that's correct about CSRs19:39
ambro718ok thanks19:39
_florent__daveshah: nice :) there is indeed a limitation of 256MB that i should remove19:40
_florent__daveshah: have you done the schematic/routing of the trellis board yourself? How much do you expect it to cost if you do a small batch?19:41
ambro718_florent__: I have set out to implement DMA for liteeth by going via Wishbone through the L2 cache in SoCSDRAM, that is by calling soc.sdram.add_wb_sdram_if() twice (for RX and TX DMA). Does that make sense to you?19:41
daveshah_florent__: yup did it in KiCad19:42
ambro718It's a 32-bit interface but should certainly be fast enough for 100mbit Ethernet, but questionable for 1gbit.19:42
daveshahCost for the first prototype batch of 3 was $2142 total assembled from Elecrow19:42
ambro718_florent__: it will be presented like a third "interconnect" option for LiteEthMAC, "withbone_dma" in addition to current "crossbar" and "wishbone".19:43
sorearneat board19:43
sorearcurious what the breakdown is on that 2142 / what are the important cost drivers19:44
_florent__daveshah: ok thanks, nice work!19:48
_florent__ambro718: that seems fine yes, the only convert i can have is indeed the available bandwidth, maybe you'll have to add a packet buffer, this way for TX, you first store packets in the core, then do the DMA with the wishbone, and the opposite for RX19:50
ambro718_florent__: LiteEthMACCore does not have any integrated buffers?19:51
daveshahsorear: about $525 for the bare board, $800 for parts and $790 for assembly19:51
_florent__ambro718: there is a fifo, but not a packet buffer19:52
_florent__ambro718: another option would be to reuse the sram writer/reader, and do a DMA on top of that19:52
daveshahFPGA about $70 each, no other massively expensive parts but they all add up in aggregate to the $267 per board19:53
ambro718_florent__: I am planning to have the CPU configure the address in DRAM where addresses and lengths of memory buffers are, and push buffers (to be transmitted / available to be filled with received data) to the core by writing number to a register (how many buffers to push)19:54
ambro718For Tx, the core would first read the memory address and buffer size from DRAM, then read the data from the referenced DRAM address and transmit it.19:54
ambro718So there would be a circular buffer of buffers, for Rx and Tx separately.19:55
ambro718there would be a status register telling you the number of buffers in each circular buffer, allowing the CPU to see how many buffers have already been transmitted / filled with received data.19:56
ambro718I think all that's necessary is that the FIFOs are large enough to deal with the latency of reading/writing data from/to DRAM but also this bookkeeping to figure out the next buffer address.19:57
_florent__ambro718: in fact, maybe you could just write a generic DMA for LiteX (read from wishbone, write to wishbone) and just use this for LiteEth with the current SRAM buffers19:57
keesjwhat kind of crazy projects are you doing ambro718 ?19:57
_florent__ambro718: this way you already have the circular buffers19:57
keesjreally cool19:58
ambro718_florent__: You are completely correct, that's what I'll do. Write DMA core will read from stream and write to WB, and read DMA core will read from DRAM via WB and write to stream.19:59
keesjdaveshah: also .. very uber cool20:00
ambro718keesj: I'm just trying to get ethernet working properly for linux-on-litex-vexriscv, for fun :)20:00
ambro718_florent__: the bandwidth problem could be addressed by making the WB interface of the L2 cache wider?20:01
ambro718There is no simple way to expose a 16-bit status register on CSR atomically?20:05
ambro718I need to expose a status for the current number of buffers the dma is responsible for, and that being read out incorrectly is not an option.20:07
ambro718_florent__: wait I misunderstood you, why would I use the current SRAM buffers?20:09
ambro718Data should go automatically to/from DRAM to the right places without the CPU having to start a DMA transfer for each packet (that's inefficient).20:11
ambro718Seems a bit wasteful to transfer to SRAM first and then from SRAM to DRAM.20:13
ambro718It should to straight from/to DRAM but have a large enough FIFO to work.20:14
_florent__yes the generic DMA solution is less efficient20:21
keesjthe next leap is litex on linux self hosting20:21
_florent__with a large FIFO, your solution should work fine20:22
ambro718It's still going to be generic but in the sense that it will allow you to write data from any stream into DRAM based on a circular buffer of buffers itself managed in DRAM (and similar for read).20:23
_florent__ambro718: if that's not fast enough, we'll see how to improve the bandwidth20:23
ambro718Also seems less fragile (less pieces).20:24
ambro718How will inheriting an internal helper base class work in a class inheriting Module? I want to put some things common to Rx and Tx into a base class.20:42
ambro718What's the best way to define a CSR that allows the CPU to send a simple signal to the core?20:56
ambro718One write by CPU results in the core seeing the signal exactly one clock cycle.20:57
ambro718Do litex cores need custom implementation of reset?21:06
_florent__ambro718: for inheriting, you can create a common class and inheritate from it in your Rx/Tx classes, that's similar to how you would do it in others Python scripts21:19
_florent__ambro718: If you can to trigger some logic on CPU writes, you can create a CSR() and then use .r/.re21:20
_florent__ambro718: for the reset, no need to implement it, you can use ResetInserter on top of your module21:20
ambro718I need a signal from the CPU to make the core update the value in a different 16-bit CSR so it can be read atomically.21:22
ambro7181-bit CSRStatus? Or will 0 bit work?21:22
ambro718What happens if I assign to a signal multiple times in sync? In particular, I have some logic that conditionally increments it by some number, and some logic that conditionally decrements it by one.22:02
ambro718Will this magically work or do I need special handling?22:02
