Tuesday, 2020-06-30

*** tpb has joined #litex00:00
*** lf has quit IRC00:02
*** lf has joined #litex00:02
*** st-gourichon-f has joined #litex00:03
*** st-gourichon-fid has quit IRC00:04
gregdavill_florent_: Thanks for looking into that so quickly! I've pulled your litedram changes and altered the CSR module in my design with the external reset signal. It's looking good here.00:33
gregdavillAn interesting observation, I'm getting different bitslip results if I load my design via JTAG, compared to if it's loaded from FLASH.00:35
*** Degi has quit IRC01:53
*** Degi has joined #litex01:54
*** jaseg has quit IRC02:15
*** jaseg has joined #litex02:16
*** guan has quit IRC02:36
*** bubble_buster has quit IRC02:36
*** mithro has quit IRC02:37
*** levi has quit IRC02:37
*** mithro has joined #litex02:38
*** bubble_buster has joined #litex02:39
*** guan has joined #litex02:41
*** levi has joined #litex02:44
*** m4ssi has joined #litex05:07
*** kgugala_ has joined #litex05:08
*** kgugala has quit IRC05:09
*** m4ssi has quit IRC06:23
*** st-gouri- has joined #litex06:47
*** st-gourichon-f has quit IRC06:50
*** kgugala has joined #litex07:11
*** kgugala_ has quit IRC07:15
*** gregdavill has quit IRC07:42
*** leons has quit IRC08:38
*** CarlFK[m] has quit IRC08:38
*** disasm[m] has quit IRC08:38
*** sajattack[m] has quit IRC08:38
*** nrossi has quit IRC08:38
*** david-sawatzke[m has quit IRC08:38
*** john_k[m] has quit IRC08:38
*** xobs has quit IRC08:38
*** david-sawatzke[m has joined #litex08:52
*** gregdavill has joined #litex09:03
*** xobs has joined #litex09:24
*** disasm[m] has joined #litex09:24
*** CarlFK[m] has joined #litex09:24
*** john_k[m] has joined #litex09:24
*** sajattack[m] has joined #litex09:24
*** nrossi has joined #litex09:24
*** leons has joined #litex09:24
*** kgugala_ has joined #litex09:53
*** scanakci has quit IRC09:55
*** kgugala_ has quit IRC09:55
*** kgugala has quit IRC09:55
*** kgugala has joined #litex09:55
_florent_gregdavill: indeed i also get different bitstlip results when loading multiple time via JTAG, that's the next things to investigate :)12:32
_florent_things/thing12:32
somlo_florent_, gregdavill: memtest on rocket+litex on the trellisboard (ecp5-85k) used to fail 30-50% of the time, depending on the week. After yesterday's litedram update, it's down to 10% :) I used to think it had maybe something to do with the board and chips warming up after a while (when the error rate would decrease)...12:41
somlonot sure I'm helping, but figured I'd throw in an extra data point, fwiw...12:41
somlonever made it to actualy tinkering with the litedram settings in a systematic way, so thanks for doing that!12:42
*** Skip has joined #litex13:29
*** gregdavill has quit IRC13:32
_florent_somlo: thanks for the feedback14:52
*** CarlFK has quit IRC16:04
*** CarlFK has joined #litex16:16
st-gouri-Hi! We are sending bulk (around 150kbytes total) data to a wishbone target, and it's very very slow, around 50 bytes per second. Is there some documentation about overhead and what we could do?16:46
lfst-gouri-: i am new to this but could you give some extra info like: is that over a bridge, how many master are on the bus.16:55
st-gouri-lf, sure, thanks.16:55
st-gouri-To be clearer, PC runs wishbone client code in python, connected to a litex_server, that litex_server sends data through a 3-wire UART to the design.  so far so good?16:56
st-gouri-The bulk data is used to drive a second UART that sends the bulk data to some other device.16:57
st-gouri-Currently, we send bytes one by one and it's not clear if that is the actual performance killer.16:57
st-gouri-We have understood that an event register is necessary to read data back to our client. Writing 2 to the UART_EV_PENDING registers signals that we have read the byte from UART_RXTX, so that the design make the next byte received by that UART available at the register.17:03
zypAFAIK the uart protocol does 32-bit transfers, so if you're only using 8 of them, that's a 4x overhead in itself17:03
lfbut still you should be able to get nearly 1000 request per sec17:04
st-gouri-lf, interesting.17:04
zypwhat baudrate does the bridge run at?17:05
st-gouri-Let me check.17:05
st-gouri-zyp, bridge at 115200 bauds.17:05
st-gouri-Any idea about the overhead of a request?17:05
lfunreleated but i think this is wrong. https://github.com/enjoy-digital/litex/blob/master/litex/soc/cores/uart.py#L28617:05
tpbTitle: litex/uart.py at master · enjoy-digital/litex · GitHub (at github.com)17:05
lfit checks the data_width not address_width17:06
zypare you doing interleaved reads and writes as well?17:09
zypif so, you're also subject to the latency of any usb-uart involved17:09
zypI've found that one of my usb-uart adapters takes like 10x longer to transfer a litescope buffer than another, because it's using a buffering strategy optimized for throughput rather than latency, or something like that17:11
_florent_st-gouri-: i also think it could be related to interleaved reads/writes as zyp suggests17:13
_florent_st-gouri-: the UART bridge is not very fast, but it shouldn't be that slow...17:13
st-gouri-How many bytes are sent by litex_server for each request?17:15
zypI'm guessing 817:16
zypfour for the address and four four the data17:16
st-gouri-Sounds reasonable.17:16
lfcmd, length, 4x addr, 4x data i think for write17:16
st-gouri-Regarding interleaved read and write, in spirit yes, let me check.17:16
zypah, yeah, of course there has to be a cmd as well17:16
zypyeah, lf is correct, here's a usb-capture I did of wishbone-bridge traffic some weeks ago17:19
zyphttps://bin.jvnv.net/file/GtKiI.png17:19
*** scanakci has joined #litex17:20
lfbut if i read the UARTBone corretly it can only it can only make data_width alinged writes17:20
zypthat'd be pretty natural17:20
zypI'm assuming it doesn't do much in the way of width conversion at all17:21
st-gouri-Can we in one etherbone request send several values in sequence to the same register?17:21
zypI don't think so17:22
st-gouri-Like, send 16 bytes to be sent to the same address?17:22
zypI mean, I don't know, but I would guess not17:22
st-gouri-Or it would need to extend the protocol. Haven't actually looked at it. Might be an option.17:22
st-gouri-We might have to send much data through it later on.  Perhaps interactive terminal. Perhaps gdb server procotol.17:23
lfok any more deatails? you have one UARTBone and on UART on the bus. and nothing else?17:24
lfboth from litex?17:24
zypjudging by https://github.com/enjoy-digital/litex/blob/master/litex/soc/cores/uart.py#L344, it's always incrementing the address if writing more than one word17:24
tpbTitle: litex/uart.py at master · enjoy-digital/litex · GitHub (at github.com)17:24
st-gouri-lf, both from litex. Not much activity inside the design at this point in time.17:25
lfst-gouri-: so only one master on the bus?17:25
st-gouri-zyp, interesting.17:25
st-gouri-Mhmhm, most probably only one active master on the bus, the uart fed by the litex_server.17:26
zypst-gouri-, what is your goal for what you're doing?17:27
st-gouri-Our design is a kind of I/O chip for an external (separate chip) CPU.17:28
zypand you're planning to use uart for the interface towards the CPU?17:29
st-gouri-The UART that we drive from the PC, along with some GPIO pins, is now capable of sending the correct sequence to boot the CPU (microcontroller). But it's slow.17:29
st-gouri-Booting the CPU involves to GPIO and an UART.17:30
st-gouri-Booting the CPU involves two (2) GPIO and an (1) UART.17:30
zypand what will drive this?17:31
st-gouri-zyp, not sure about what your asking.17:31
lfok if it looks like this "INFO:SoCBusHandler:Interconnect: InterconnectShared (1 <-> 2)." i can cross of rouge bus master from my list. and go to python usb.17:31
zypst-gouri-, right now you're using a PC to talk to the wishbone bridge17:31
st-gouri-After the CPU is booted, it will drive the I/O chip by targetting the wishbone bus. Probably through SPI.17:31
st-gouri-In the field, the design will manage to setup the CPU by itself. No PC will be needed.17:32
zypso if you won't use the uart bridge in the field, I guess the performance of it is kinda moot17:33
st-gouri-One simple solution in the field is to have the design (in a FPGA) boot a softcore that runs some code to target wishbon. Same actions, lower overhead.17:33
st-gouri-zyp, not so moot, because currently, to boot the CPU we need to send 160k of data and that takes 41 minutes. That is a problem.17:34
zypif you want a faster bridge, I guess you could try valentyusb17:34
st-gouri-In another setup, the PC directly driving the CPU with an UART, we get 10kbytes/s, not 50 bytes/sec.17:34
st-gouri-zyp, at this point I'm trying to understand the actual source of slowliness. (I have a simple external log analyzer at my disposal, too.)17:35
lfload a bitstream to connect the pins boot cpu load new bitstream?17:35
lfyou are pulling the status register to see it it finnisched?17:37
_florent_st-gouri-: are you able to provide a minimal design to reproduce the issue?17:37
_florent_i could look at this17:37
st-gouri-lf, planned in the field, FPGA gets the design from flash, which begets a small softcore CPU. That CPU boots from same flash, runs software. That software targets through wishbone GPIO and UART. These boot the main processor. Then the main processor (fast) can do whatever, targetting wishbone through SPI.17:38
_florent_the bridge supports bursts, but not sure the current software makes use of it17:38
st-gouri-lf, which status register? EV_PENDING ?17:38
st-gouri-_florent_, bursts to same address?17:39
_florent_st-gouri-: no this is incrementing17:39
_florent_st-gouri-: but we could eventually add a different comment for non-incrementing writes17:40
st-gouri-If the UART has, say, 16 bytes buffer, and we send by burst of 16 bytes, then we might have only 60% overhead instead of 1000%.17:40
st-gouri-But I suspect there is something else.17:40
_florent_st-gouri-: yes i also suspect there is something else17:40
_florent_can you share the part of the python code on the host that is doing the upload?17:41
st-gouri-Yes.17:41
st-gouri-Will be all open-source eventually, but is not yet. Will pastebin parts.17:42
lfand can you hookup a logic analyser to both uarts? i maybe the USB-UART adapter is realy the slow with short messages17:43
st-gouri-One question about EV_PENDING. Is it set after any byte sent? Or when send buffer becomes empty?17:43
st-gouri-Probably the former.17:44
st-gouri-https://paste.ubuntu.com/p/KrqCcCcv59/17:45
tpbTitle: Ubuntu Pastebin (at paste.ubuntu.com)17:45
st-gouri-Wow, the indentation appears wrong.17:45
st-gouri-The first line "def" should be aligned with the second "def".17:45
st-gouri-It's part of a Python class that mimicks the standard class for Uart, but routes through python wishbone client.17:46
_florent_st-gouri-:  for the IRQ/Pending:17:47
_florent_https://github.com/enjoy-digital/litex/blob/master/litex/soc/cores/uart.py#L22417:47
tpbTitle: litex/uart.py at master · enjoy-digital/litex · GitHub (at github.com)17:47
_florent_https://github.com/enjoy-digital/litex/blob/master/litex/soc/cores/uart.py#L23717:47
tpbTitle: litex/uart.py at master · enjoy-digital/litex · GitHub (at github.com)17:47
st-gouri-_florent_, ahah, "non-full".17:47
st-gouri-That's very nice for a local ISR. This allows nice pipelining to use the full UART bandwidth. Cool.17:48
_florent_st-gouri-: could you do a quick test to see if the issue is related to the interleaved reads/writes: remove the while (self.isTxFull()) and use a times.sleep() after self._wb.regs.uart_rxtx.write(byte)17:50
st-gouri-_florent_, good idea.17:50
st-gouri-Full class if it is of any use: https://paste.ubuntu.com/p/zvj2TbVHnF/17:50
tpbTitle: Ubuntu Pastebin (at paste.ubuntu.com)17:51
st-gouri-MMh, I have a strange unrelated error "index out of range", have to check that.17:53
st-gouri-pycharm does not let me know exactly where the exception is raised.18:02
*** FFY00 has quit IRC18:04
st-gouri-Ah, some progress.18:04
*** FFY00 has joined #litex18:05
*** FFY00 has quit IRC18:06
*** FFY00 has joined #litex18:07
st-gouri-_florent_, I simply disabled the call to isTxFull(), and get speed up to 973 bytes/sec.18:07
st-gouri-So, the problem is most probably the interleaved read/writes.18:08
st-gouri-973 bytes/sec is much more inline with what is expected from a 10-bytes long packet sending one payload byte.18:09
st-gouri-The transfer was successful and took 2 minutes 25 seconds instead of 41 minutes when interleaving reads and writes.18:10
st-gouri-I know it's successful because the CPU has definitely booted and runs our code.18:10
lfya i would say zyp is right with the slow uart adapter. it probebly thakes some time for it to flush its buffer.18:11
st-gouri-Is the problem in the design of the UART, litex-level?18:14
st-gouri-Of in a third-party USB-UART bridge?18:15
lfnot sure. but i do know that some ppl curse about uart adapperts buffering small transfers for long times.18:16
st-gouri-In this experiment, the UART adapter is soldered in the TinyFPGA BX board being used.18:17
st-gouri-Ah, no wrong.18:18
st-gouri-The UART advertises FTDI TTL232R-3V3 idVendor=0403, idProduct=6001, bcdDevice= 6.0018:18
st-gouri-Do you expect we might have better performance with another one?18:19
lfya i never had that problem so i don't know.18:19
lfand if it where a problem with that chip i don't think they would use it18:20
st-gouri-Was wrong when mentioning TinyFPGA Bx. It's not that one.  It's a separate independent cable, with the device details I provided.18:22
st-gouri-So, now I'm sending data as fast as I can without checking the TX-Full register, and it definitely can't be full, because it has n times more time that it needs, where n in the side of the etherbone packet!18:24
st-gouri-s/side/size/18:24
lfok the ftdi should flush its buffer every 16ms18:25
lfmmh that are like 60Hz or you know ~50 byte over the uart18:27
lfhttps://www.ftdichip.com/Support/Documents/AppNotes/AN232B-04_DataLatencyFlow.pdf18:27
lfpage 618:27
lf718:27
st-gouri-lf, very interesting.18:29
st-gouri-"For application programmers it must be stressed that data should be sent or received using buffersand not individual characters." ... well, this protocol kind of needs interleaving read and writes.18:30
st-gouri-"3.2Adjusting the Receive Buffer Latency Timer" -> howdo you know how to to that on Linux? From Python?18:32
lfya the problem is the read. as the respons from the brige gets stuck in the buffer18:32
lfhttps://granitedevices.com/wiki/FTDI_Linux_USB_latency18:32
tpbTitle: FTDI Linux USB latency - Granite Devices Knowledge Wiki (at granitedevices.com)18:32
st-gouri-Thanks lf for those relevant links!18:35
st-gouri-Setting latency_timer to 1 I get 218b/s instead of 50b/s.18:35
st-gouri-This shows that the latency timer is indeed involved in the delay.18:36
lfwell that is that mystery solved. but how to solve the problem18:36
st-gouri-Let n=10 the size of a Etherbone packet writing one byte to the UART.18:36
st-gouri-As long as the baudrate PC side is not higher than n times the baudrate on the other UART, we're safe.18:37
lflol yes18:37
st-gouri-And... guess what the next step will be to gain some performance. ;-)18:41
st-gouri-More seriously, what we have done today is very good. We understood the reason for such slow performance, got a fix, proved that ugly as it looks it is actually safe.18:43
lfdo i even dare guessing18:43
st-gouri-Currently, the litex_server runs at 115200. One solution is to pump it up to 10 times that speed and call it a day.18:44
lfst-gouri-: can you just bypass all the logic and switch the bypass of after you are done. but sure it its only for develepment its probebly the best solution18:56
st-gouri-Bypass all which logic?18:56
lfmy writing bad18:56
lfjust connect uart_a to uart_b this comb logic. like a mux on the tx pin18:57
lfwith18:57
*** FFY00 has quit IRC19:00
*** FFY00 has joined #litex19:00
*** FFY00 has quit IRC19:01
*** FFY00 has joined #litex19:01
st-gouri-Mfmfmf. We would lose the multiplexing property of the wishbone bridge. Would need to kill litex_server to free the UART. Then would need some way to revert. Once booted the CPU can do that. That could actually work.19:02
st-gouri-If we can beef up the first UART to 1152000 we have the same benefits and no downside.19:03
lfture19:04
st-gouri-Still, it's interesting.19:04
st-gouri-Many ideas flying. Even if we don't implement most, it's still interesting.19:04
*** FFY00 has quit IRC19:05
*** FFY00 has joined #litex19:05
*** m4ssi has joined #litex19:18
zypsorry, I had to go put the kid to bed :)19:37
zypthe 16ms buffer flush that lf is quoting sounds like the same I ran into19:37
zypone of the adapters I used was based on FT232X19:38
zypand a different adapter (based on a stm32 running a usb-uart firmware) got 10x the throughput at the same baudrate19:38
zypbefore I left I proposed looking into running the valentyusb bridge, i.e. usb directly to the fpga instead of uart19:40
lfzyp: we could add an option to send an event charater at the end of a read. but you would still need to configure that in the ftdi chip19:41
zypI haven't tested the performance of that myself, but it should eliminate the buffer latency completely19:42
zypthe valentyusb bridge stuff is based on control requests, which usb should be able to send a couple thousand of per second, so if I'm guessing, you might see a kilobyte or more per second from that19:43
zypbottleneck is going to be how fast the host side is passing requests and replies between userland and the usb controller19:44
lfya19:47
st-gouri-reading19:47
st-gouri-Thanks for the hints.19:48
lfi would need to look into litescop but maybe there is a way to change the reading behavier to get less overhead when reading or optimies it more for this buffering behevier19:48
st-gouri-TinyFPGA BX and Fomu use valentyUSB for their bootloader, IIRC.19:48
zypst-gouri-, yes19:49
zyplf, the problem with the wishbone bridge is that currently the only way to get flow control is to poll a register in between reads or writes19:53
zypalthough litescope shouldn't need that when dumping the buffer, so the protocol could be extended to add a «read address A N times»19:55
zypas far as I can see, the length argument is not considered for reads currently, only writes19:56
zypand no incrementing reads and writes are necessary for dealing with fifo registers19:57
lfbut does that help? it will just read the address as fast as the bus can. it think useing the event char or the flow controle lines of the ftdi chip to force a buffer flush. would be more helpfull.19:59
lfor useing the usb bridge19:59
zyphelp what? it'd help for litescope19:59
zypthe slow part of using litescope is dumping the capture buffer after the capture is finished20:00
lfah i have not read how litescope transfers date. but if that is behinde one address then yes that should give big bust20:01
zypyeah, litescope puts everything behind a set of CSRs20:01
zypone of them pops from a fifo20:02
zypactually, it's not even doing flow control, it's just the constant back and forth between «READ 1 from ADDR» «DATA» with latency in between20:04
lfya you could just not wait for the response and send the next request20:05
zypthat is true20:05
zypbut I wonder if that would risk overflowing the uart on the fpga20:06
zypalthough as long as it has deep enough buffers it should be fine20:07
lfi dont think UARTBone has a buffer. only the UART for CSR has fifos.20:10
lfah there is a "FT245 Asynchronous FIFO mode" but i think that uses extra pins20:11
zypbut we're discussing software fixes now :)20:11
lfya20:11
zyphardware solutions are not very useful if you already have a hardware design you don't wan't to modify20:12
st-gouri-   will go afk20:16
lfbut i think you are right that would overwelm the uart.20:16
zypyeah, it'd need enough buffering that it could start receiving the second read while replying to the first20:16
lfyou could set the event char to zero because reading a csr register will always return 3 zero bytes20:17
zypnot for 32-bit CSRs20:17
lfarg right20:17
lfwe need our own uart adapter that is uartbone aware20:19
st-gouri-Wht20:20
st-gouri-sory20:20
st-gouri-sorry20:20
zyplf, just have the rx buffer flush at four bytes :)20:20
st-gouri-A modified UARTbone could be fed 4-bytes at a time... but how to tell it there's only 3 bytes.20:21
lfya or let it handel read write and you just give it a list of addresses you need20:21
st-gouri-The fourth byte would be a status to tell if there's one, two or three bytes in the packet?20:22
st-gouri-Overhead divided by 3.20:22
st-gouri-Not only when driven through litex_server. Also locally.20:23
st-gouri-Unused bits of the fourth byte could be UART lines.20:24
st-gouri-But I digress.20:24
lfya for now the bigest deleay is the 1-16ms buffer delay of the ftdi. and without changeing the bitstream. the only way to make that fast is to put litex server an the adapter20:29
lfmmh pi zero?20:29
st-gouri-Seeya.20:30
lfbye20:30
lfi think when i get that problem i will just try some cortex-a with linux and run litex-server on that with its nativ uart.20:31
*** Skip has quit IRC20:35
lfn820:37
*** daveshah has quit IRC21:15
*** daveshah has joined #litex21:15
*** st-gouri- has quit IRC22:42
*** m4ssi has quit IRC22:53

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!