Tuesday, 2020-06-30

*** tpb has joined #litex		00:00
*** lf has quit IRC		00:02
*** lf has joined #litex		00:02
*** st-gourichon-f has joined #litex		00:03
*** st-gourichon-fid has quit IRC		00:04
gregdavill	_florent_: Thanks for looking into that so quickly! I've pulled your litedram changes and altered the CSR module in my design with the external reset signal. It's looking good here.	00:33
gregdavill	An interesting observation, I'm getting different bitslip results if I load my design via JTAG, compared to if it's loaded from FLASH.	00:35
*** Degi has quit IRC		01:53
*** Degi has joined #litex		01:54
*** jaseg has quit IRC		02:15
*** jaseg has joined #litex		02:16
*** guan has quit IRC		02:36
*** bubble_buster has quit IRC		02:36
*** mithro has quit IRC		02:37
*** levi has quit IRC		02:37
*** mithro has joined #litex		02:38
*** bubble_buster has joined #litex		02:39
*** guan has joined #litex		02:41
*** levi has joined #litex		02:44
*** m4ssi has joined #litex		05:07
*** kgugala_ has joined #litex		05:08
*** kgugala has quit IRC		05:09
*** m4ssi has quit IRC		06:23
*** st-gouri- has joined #litex		06:47
*** st-gourichon-f has quit IRC		06:50
*** kgugala has joined #litex		07:11
*** kgugala_ has quit IRC		07:15
*** gregdavill has quit IRC		07:42
*** leons has quit IRC		08:38
*** CarlFK[m] has quit IRC		08:38
*** disasm[m] has quit IRC		08:38
*** sajattack[m] has quit IRC		08:38
*** nrossi has quit IRC		08:38
*** david-sawatzke[m has quit IRC		08:38
*** john_k[m] has quit IRC		08:38
*** xobs has quit IRC		08:38
*** david-sawatzke[m has joined #litex		08:52
*** gregdavill has joined #litex		09:03
*** xobs has joined #litex		09:24
*** disasm[m] has joined #litex		09:24
*** CarlFK[m] has joined #litex		09:24
*** john_k[m] has joined #litex		09:24
*** sajattack[m] has joined #litex		09:24
*** nrossi has joined #litex		09:24
*** leons has joined #litex		09:24
*** kgugala_ has joined #litex		09:53
*** scanakci has quit IRC		09:55
*** kgugala_ has quit IRC		09:55
*** kgugala has quit IRC		09:55
*** kgugala has joined #litex		09:55
_florent_	gregdavill: indeed i also get different bitstlip results when loading multiple time via JTAG, that's the next things to investigate :)	12:32
_florent_	things/thing	12:32
somlo	_florent_, gregdavill: memtest on rocket+litex on the trellisboard (ecp5-85k) used to fail 30-50% of the time, depending on the week. After yesterday's litedram update, it's down to 10% :) I used to think it had maybe something to do with the board and chips warming up after a while (when the error rate would decrease)...	12:41
somlo	not sure I'm helping, but figured I'd throw in an extra data point, fwiw...	12:41
somlo	never made it to actualy tinkering with the litedram settings in a systematic way, so thanks for doing that!	12:42
*** Skip has joined #litex		13:29
*** gregdavill has quit IRC		13:32
_florent_	somlo: thanks for the feedback	14:52
*** CarlFK has quit IRC		16:04
*** CarlFK has joined #litex		16:16
st-gouri-	Hi! We are sending bulk (around 150kbytes total) data to a wishbone target, and it's very very slow, around 50 bytes per second. Is there some documentation about overhead and what we could do?	16:46
lf	st-gouri-: i am new to this but could you give some extra info like: is that over a bridge, how many master are on the bus.	16:55
st-gouri-	lf, sure, thanks.	16:55
st-gouri-	To be clearer, PC runs wishbone client code in python, connected to a litex_server, that litex_server sends data through a 3-wire UART to the design. so far so good?	16:56
st-gouri-	The bulk data is used to drive a second UART that sends the bulk data to some other device.	16:57
st-gouri-	Currently, we send bytes one by one and it's not clear if that is the actual performance killer.	16:57
st-gouri-	We have understood that an event register is necessary to read data back to our client. Writing 2 to the UART_EV_PENDING registers signals that we have read the byte from UART_RXTX, so that the design make the next byte received by that UART available at the register.	17:03
zyp	AFAIK the uart protocol does 32-bit transfers, so if you're only using 8 of them, that's a 4x overhead in itself	17:03
lf	but still you should be able to get nearly 1000 request per sec	17:04
st-gouri-	lf, interesting.	17:04
zyp	what baudrate does the bridge run at?	17:05
st-gouri-	Let me check.	17:05
st-gouri-	zyp, bridge at 115200 bauds.	17:05
st-gouri-	Any idea about the overhead of a request?	17:05
lf	unreleated but i think this is wrong. https://github.com/enjoy-digital/litex/blob/master/litex/soc/cores/uart.py#L286	17:05
tpb	Title: litex/uart.py at master · enjoy-digital/litex · GitHub (at github.com)	17:05
lf	it checks the data_width not address_width	17:06
zyp	are you doing interleaved reads and writes as well?	17:09
zyp	if so, you're also subject to the latency of any usb-uart involved	17:09
zyp	I've found that one of my usb-uart adapters takes like 10x longer to transfer a litescope buffer than another, because it's using a buffering strategy optimized for throughput rather than latency, or something like that	17:11
_florent_	st-gouri-: i also think it could be related to interleaved reads/writes as zyp suggests	17:13
_florent_	st-gouri-: the UART bridge is not very fast, but it shouldn't be that slow...	17:13
st-gouri-	How many bytes are sent by litex_server for each request?	17:15
zyp	I'm guessing 8	17:16
zyp	four for the address and four four the data	17:16
st-gouri-	Sounds reasonable.	17:16
lf	cmd, length, 4x addr, 4x data i think for write	17:16
st-gouri-	Regarding interleaved read and write, in spirit yes, let me check.	17:16
zyp	ah, yeah, of course there has to be a cmd as well	17:16
zyp	yeah, lf is correct, here's a usb-capture I did of wishbone-bridge traffic some weeks ago	17:19
zyp	https://bin.jvnv.net/file/GtKiI.png	17:19
*** scanakci has joined #litex		17:20
lf	but if i read the UARTBone corretly it can only it can only make data_width alinged writes	17:20
zyp	that'd be pretty natural	17:20
zyp	I'm assuming it doesn't do much in the way of width conversion at all	17:21
st-gouri-	Can we in one etherbone request send several values in sequence to the same register?	17:21
zyp	I don't think so	17:22
st-gouri-	Like, send 16 bytes to be sent to the same address?	17:22
zyp	I mean, I don't know, but I would guess not	17:22
st-gouri-	Or it would need to extend the protocol. Haven't actually looked at it. Might be an option.	17:22
st-gouri-	We might have to send much data through it later on. Perhaps interactive terminal. Perhaps gdb server procotol.	17:23
lf	ok any more deatails? you have one UARTBone and on UART on the bus. and nothing else?	17:24
lf	both from litex?	17:24
zyp	judging by https://github.com/enjoy-digital/litex/blob/master/litex/soc/cores/uart.py#L344, it's always incrementing the address if writing more than one word	17:24
tpb	Title: litex/uart.py at master · enjoy-digital/litex · GitHub (at github.com)	17:24
st-gouri-	lf, both from litex. Not much activity inside the design at this point in time.	17:25
lf	st-gouri-: so only one master on the bus?	17:25
st-gouri-	zyp, interesting.	17:25
st-gouri-	Mhmhm, most probably only one active master on the bus, the uart fed by the litex_server.	17:26
zyp	st-gouri-, what is your goal for what you're doing?	17:27
st-gouri-	Our design is a kind of I/O chip for an external (separate chip) CPU.	17:28
zyp	and you're planning to use uart for the interface towards the CPU?	17:29
st-gouri-	The UART that we drive from the PC, along with some GPIO pins, is now capable of sending the correct sequence to boot the CPU (microcontroller). But it's slow.	17:29
st-gouri-	Booting the CPU involves to GPIO and an UART.	17:30
st-gouri-	Booting the CPU involves two (2) GPIO and an (1) UART.	17:30
zyp	and what will drive this?	17:31
st-gouri-	zyp, not sure about what your asking.	17:31
lf	ok if it looks like this "INFO:SoCBusHandler:Interconnect: InterconnectShared (1 <-> 2)." i can cross of rouge bus master from my list. and go to python usb.	17:31
zyp	st-gouri-, right now you're using a PC to talk to the wishbone bridge	17:31
st-gouri-	After the CPU is booted, it will drive the I/O chip by targetting the wishbone bus. Probably through SPI.	17:31
st-gouri-	In the field, the design will manage to setup the CPU by itself. No PC will be needed.	17:32
zyp	so if you won't use the uart bridge in the field, I guess the performance of it is kinda moot	17:33
st-gouri-	One simple solution in the field is to have the design (in a FPGA) boot a softcore that runs some code to target wishbon. Same actions, lower overhead.	17:33
st-gouri-	zyp, not so moot, because currently, to boot the CPU we need to send 160k of data and that takes 41 minutes. That is a problem.	17:34
zyp	if you want a faster bridge, I guess you could try valentyusb	17:34
st-gouri-	In another setup, the PC directly driving the CPU with an UART, we get 10kbytes/s, not 50 bytes/sec.	17:34
st-gouri-	zyp, at this point I'm trying to understand the actual source of slowliness. (I have a simple external log analyzer at my disposal, too.)	17:35
lf	load a bitstream to connect the pins boot cpu load new bitstream?	17:35
lf	you are pulling the status register to see it it finnisched?	17:37
_florent_	st-gouri-: are you able to provide a minimal design to reproduce the issue?	17:37
_florent_	i could look at this	17:37
st-gouri-	lf, planned in the field, FPGA gets the design from flash, which begets a small softcore CPU. That CPU boots from same flash, runs software. That software targets through wishbone GPIO and UART. These boot the main processor. Then the main processor (fast) can do whatever, targetting wishbone through SPI.	17:38
_florent_	the bridge supports bursts, but not sure the current software makes use of it	17:38
st-gouri-	lf, which status register? EV_PENDING ?	17:38
st-gouri-	_florent_, bursts to same address?	17:39
_florent_	st-gouri-: no this is incrementing	17:39
_florent_	st-gouri-: but we could eventually add a different comment for non-incrementing writes	17:40
st-gouri-	If the UART has, say, 16 bytes buffer, and we send by burst of 16 bytes, then we might have only 60% overhead instead of 1000%.	17:40
st-gouri-	But I suspect there is something else.	17:40
_florent_	st-gouri-: yes i also suspect there is something else	17:40
_florent_	can you share the part of the python code on the host that is doing the upload?	17:41
st-gouri-	Yes.	17:41
st-gouri-	Will be all open-source eventually, but is not yet. Will pastebin parts.	17:42
lf	and can you hookup a logic analyser to both uarts? i maybe the USB-UART adapter is realy the slow with short messages	17:43
st-gouri-	One question about EV_PENDING. Is it set after any byte sent? Or when send buffer becomes empty?	17:43
st-gouri-	Probably the former.	17:44
st-gouri-	https://paste.ubuntu.com/p/KrqCcCcv59/	17:45
tpb	Title: Ubuntu Pastebin (at paste.ubuntu.com)	17:45
st-gouri-	Wow, the indentation appears wrong.	17:45
st-gouri-	The first line "def" should be aligned with the second "def".	17:45
st-gouri-	It's part of a Python class that mimicks the standard class for Uart, but routes through python wishbone client.	17:46
_florent_	st-gouri-: for the IRQ/Pending:	17:47
_florent_	https://github.com/enjoy-digital/litex/blob/master/litex/soc/cores/uart.py#L224	17:47
tpb	Title: litex/uart.py at master · enjoy-digital/litex · GitHub (at github.com)	17:47
_florent_	https://github.com/enjoy-digital/litex/blob/master/litex/soc/cores/uart.py#L237	17:47
tpb	Title: litex/uart.py at master · enjoy-digital/litex · GitHub (at github.com)	17:47
st-gouri-	_florent_, ahah, "non-full".	17:47
st-gouri-	That's very nice for a local ISR. This allows nice pipelining to use the full UART bandwidth. Cool.	17:48
_florent_	st-gouri-: could you do a quick test to see if the issue is related to the interleaved reads/writes: remove the while (self.isTxFull()) and use a times.sleep() after self._wb.regs.uart_rxtx.write(byte)	17:50
st-gouri-	_florent_, good idea.	17:50
st-gouri-	Full class if it is of any use: https://paste.ubuntu.com/p/zvj2TbVHnF/	17:50
tpb	Title: Ubuntu Pastebin (at paste.ubuntu.com)	17:51
st-gouri-	MMh, I have a strange unrelated error "index out of range", have to check that.	17:53
st-gouri-	pycharm does not let me know exactly where the exception is raised.	18:02
*** FFY00 has quit IRC		18:04
st-gouri-	Ah, some progress.	18:04
*** FFY00 has joined #litex		18:05
*** FFY00 has quit IRC		18:06
*** FFY00 has joined #litex		18:07
st-gouri-	_florent_, I simply disabled the call to isTxFull(), and get speed up to 973 bytes/sec.	18:07
st-gouri-	So, the problem is most probably the interleaved read/writes.	18:08
st-gouri-	973 bytes/sec is much more inline with what is expected from a 10-bytes long packet sending one payload byte.	18:09
st-gouri-	The transfer was successful and took 2 minutes 25 seconds instead of 41 minutes when interleaving reads and writes.	18:10
st-gouri-	I know it's successful because the CPU has definitely booted and runs our code.	18:10
lf	ya i would say zyp is right with the slow uart adapter. it probebly thakes some time for it to flush its buffer.	18:11
st-gouri-	Is the problem in the design of the UART, litex-level?	18:14
st-gouri-	Of in a third-party USB-UART bridge?	18:15
lf	not sure. but i do know that some ppl curse about uart adapperts buffering small transfers for long times.	18:16
st-gouri-	In this experiment, the UART adapter is soldered in the TinyFPGA BX board being used.	18:17
st-gouri-	Ah, no wrong.	18:18
st-gouri-	The UART advertises FTDI TTL232R-3V3 idVendor=0403, idProduct=6001, bcdDevice= 6.00	18:18
st-gouri-	Do you expect we might have better performance with another one?	18:19
lf	ya i never had that problem so i don't know.	18:19
lf	and if it where a problem with that chip i don't think they would use it	18:20
st-gouri-	Was wrong when mentioning TinyFPGA Bx. It's not that one. It's a separate independent cable, with the device details I provided.	18:22
st-gouri-	So, now I'm sending data as fast as I can without checking the TX-Full register, and it definitely can't be full, because it has n times more time that it needs, where n in the side of the etherbone packet!	18:24
st-gouri-	s/side/size/	18:24
lf	ok the ftdi should flush its buffer every 16ms	18:25
lf	mmh that are like 60Hz or you know ~50 byte over the uart	18:27
lf	https://www.ftdichip.com/Support/Documents/AppNotes/AN232B-04_DataLatencyFlow.pdf	18:27
lf	page 6	18:27
lf	7	18:27
st-gouri-	lf, very interesting.	18:29
st-gouri-	"For application programmers it must be stressed that data should be sent or received using buffersand not individual characters." ... well, this protocol kind of needs interleaving read and writes.	18:30
st-gouri-	"3.2Adjusting the Receive Buffer Latency Timer" -> howdo you know how to to that on Linux? From Python?	18:32
lf	ya the problem is the read. as the respons from the brige gets stuck in the buffer	18:32
lf	https://granitedevices.com/wiki/FTDI_Linux_USB_latency	18:32
tpb	Title: FTDI Linux USB latency - Granite Devices Knowledge Wiki (at granitedevices.com)	18:32
st-gouri-	Thanks lf for those relevant links!	18:35
st-gouri-	Setting latency_timer to 1 I get 218b/s instead of 50b/s.	18:35
st-gouri-	This shows that the latency timer is indeed involved in the delay.	18:36
lf	well that is that mystery solved. but how to solve the problem	18:36
st-gouri-	Let n=10 the size of a Etherbone packet writing one byte to the UART.	18:36
st-gouri-	As long as the baudrate PC side is not higher than n times the baudrate on the other UART, we're safe.	18:37
lf	lol yes	18:37
st-gouri-	And... guess what the next step will be to gain some performance. ;-)	18:41
st-gouri-	More seriously, what we have done today is very good. We understood the reason for such slow performance, got a fix, proved that ugly as it looks it is actually safe.	18:43
lf	do i even dare guessing	18:43
st-gouri-	Currently, the litex_server runs at 115200. One solution is to pump it up to 10 times that speed and call it a day.	18:44
lf	st-gouri-: can you just bypass all the logic and switch the bypass of after you are done. but sure it its only for develepment its probebly the best solution	18:56
st-gouri-	Bypass all which logic?	18:56
lf	my writing bad	18:56
lf	just connect uart_a to uart_b this comb logic. like a mux on the tx pin	18:57
lf	with	18:57
*** FFY00 has quit IRC		19:00
*** FFY00 has joined #litex		19:00
*** FFY00 has quit IRC		19:01
*** FFY00 has joined #litex		19:01
st-gouri-	Mfmfmf. We would lose the multiplexing property of the wishbone bridge. Would need to kill litex_server to free the UART. Then would need some way to revert. Once booted the CPU can do that. That could actually work.	19:02
st-gouri-	If we can beef up the first UART to 1152000 we have the same benefits and no downside.	19:03
lf	ture	19:04
st-gouri-	Still, it's interesting.	19:04
st-gouri-	Many ideas flying. Even if we don't implement most, it's still interesting.	19:04
*** FFY00 has quit IRC		19:05
*** FFY00 has joined #litex		19:05
*** m4ssi has joined #litex		19:18
zyp	sorry, I had to go put the kid to bed :)	19:37
zyp	the 16ms buffer flush that lf is quoting sounds like the same I ran into	19:37
zyp	one of the adapters I used was based on FT232X	19:38
zyp	and a different adapter (based on a stm32 running a usb-uart firmware) got 10x the throughput at the same baudrate	19:38
zyp	before I left I proposed looking into running the valentyusb bridge, i.e. usb directly to the fpga instead of uart	19:40
lf	zyp: we could add an option to send an event charater at the end of a read. but you would still need to configure that in the ftdi chip	19:41
zyp	I haven't tested the performance of that myself, but it should eliminate the buffer latency completely	19:42
zyp	the valentyusb bridge stuff is based on control requests, which usb should be able to send a couple thousand of per second, so if I'm guessing, you might see a kilobyte or more per second from that	19:43
zyp	bottleneck is going to be how fast the host side is passing requests and replies between userland and the usb controller	19:44
lf	ya	19:47
st-gouri-	reading	19:47
st-gouri-	Thanks for the hints.	19:48
lf	i would need to look into litescop but maybe there is a way to change the reading behavier to get less overhead when reading or optimies it more for this buffering behevier	19:48
st-gouri-	TinyFPGA BX and Fomu use valentyUSB for their bootloader, IIRC.	19:48
zyp	st-gouri-, yes	19:49
zyp	lf, the problem with the wishbone bridge is that currently the only way to get flow control is to poll a register in between reads or writes	19:53
zyp	although litescope shouldn't need that when dumping the buffer, so the protocol could be extended to add a «read address A N times»	19:55
zyp	as far as I can see, the length argument is not considered for reads currently, only writes	19:56
zyp	and no incrementing reads and writes are necessary for dealing with fifo registers	19:57
lf	but does that help? it will just read the address as fast as the bus can. it think useing the event char or the flow controle lines of the ftdi chip to force a buffer flush. would be more helpfull.	19:59
lf	or useing the usb bridge	19:59
zyp	help what? it'd help for litescope	19:59
zyp	the slow part of using litescope is dumping the capture buffer after the capture is finished	20:00
lf	ah i have not read how litescope transfers date. but if that is behinde one address then yes that should give big bust	20:01
zyp	yeah, litescope puts everything behind a set of CSRs	20:01
zyp	one of them pops from a fifo	20:02
zyp	actually, it's not even doing flow control, it's just the constant back and forth between «READ 1 from ADDR» «DATA» with latency in between	20:04
lf	ya you could just not wait for the response and send the next request	20:05
zyp	that is true	20:05
zyp	but I wonder if that would risk overflowing the uart on the fpga	20:06
zyp	although as long as it has deep enough buffers it should be fine	20:07
lf	i dont think UARTBone has a buffer. only the UART for CSR has fifos.	20:10
lf	ah there is a "FT245 Asynchronous FIFO mode" but i think that uses extra pins	20:11
zyp	but we're discussing software fixes now :)	20:11
lf	ya	20:11
zyp	hardware solutions are not very useful if you already have a hardware design you don't wan't to modify	20:12
st-gouri-	will go afk	20:16
lf	but i think you are right that would overwelm the uart.	20:16
zyp	yeah, it'd need enough buffering that it could start receiving the second read while replying to the first	20:16
lf	you could set the event char to zero because reading a csr register will always return 3 zero bytes	20:17
zyp	not for 32-bit CSRs	20:17
lf	arg right	20:17
lf	we need our own uart adapter that is uartbone aware	20:19
st-gouri-	Wht	20:20
st-gouri-	sory	20:20
st-gouri-	sorry	20:20
zyp	lf, just have the rx buffer flush at four bytes :)	20:20
st-gouri-	A modified UARTbone could be fed 4-bytes at a time... but how to tell it there's only 3 bytes.	20:21
lf	ya or let it handel read write and you just give it a list of addresses you need	20:21
st-gouri-	The fourth byte would be a status to tell if there's one, two or three bytes in the packet?	20:22
st-gouri-	Overhead divided by 3.	20:22
st-gouri-	Not only when driven through litex_server. Also locally.	20:23
st-gouri-	Unused bits of the fourth byte could be UART lines.	20:24
st-gouri-	But I digress.	20:24
lf	ya for now the bigest deleay is the 1-16ms buffer delay of the ftdi. and without changeing the bitstream. the only way to make that fast is to put litex server an the adapter	20:29
lf	mmh pi zero?	20:29
st-gouri-	Seeya.	20:30
lf	bye	20:30
lf	i think when i get that problem i will just try some cortex-a with linux and run litex-server on that with its nativ uart.	20:31
*** Skip has quit IRC		20:35
lf	n8	20:37
*** daveshah has quit IRC		21:15
*** daveshah has joined #litex		21:15
*** st-gouri- has quit IRC		22:42
*** m4ssi has quit IRC		22:53

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!