Sunday, 2020-08-09

*** tpb has joined #yosys00:00
*** maartenBE has quit IRC00:26
*** Degi has quit IRC00:33
*** maartenBE has joined #yosys00:34
*** Degi has joined #yosys00:34
*** emeb_mac has quit IRC01:19
*** emeb has quit IRC01:19
*** emeb has joined #yosys01:20
*** emeb has quit IRC01:27
*** emeb has joined #yosys01:28
*** emeb_mac has joined #yosys01:28
*** cr1901_modern has quit IRC02:00
*** emeb has quit IRC02:01
*** citypw has joined #yosys02:12
*** cr1901_modern has joined #yosys02:28
*** SpaceCoaster has joined #yosys03:45
*** FFY00 has quit IRC03:53
*** FFY00 has joined #yosys03:54
*** az0re has joined #yosys04:32
*** craigo has joined #yosys05:05
*** xtro has quit IRC05:46
*** FL4SHK has quit IRC06:06
*** FL4SHK has joined #yosys06:08
*** emeb_mac has quit IRC06:56
*** craigo has quit IRC07:19
*** FL4SHK has quit IRC07:20
*** craigo has joined #yosys07:21
*** Asu has joined #yosys07:57
*** N2TOH_ has joined #yosys08:23
*** N2TOH has quit IRC08:27
*** kraiskil has joined #yosys08:58
*** kraiskil has quit IRC09:32
*** kraiskil has joined #yosys09:34
pepijndevosdaveshah, in Gowin it seems pip delay depends on fanout. How is this represented in nexpnr(-generic)?09:35
*** kraiskil has quit IRC09:42
Loftypepijndevos: in nextpnr's API, that's up to you; you can probably model it as a base + constant * fanout model09:52
daveshahThis isn't something nextpnr-generic supports at the moment09:53
LoftyBut pip delay in general depends on fanout due to capacitance, I think09:53
daveshahYes although not all arches model this09:53
pepijndevosright, that was more or less what I expected heh09:53
pepijndevosty09:53
daveshahECP5 does so that would be a starting point if you want inspiration09:53
pepijndevosah good to know09:53
pepijndevosI think I'll keep postponing a gowin nextpnr target a while longer until at least clock routing works09:54
pepijndevosI'll probably ping you at that time for some recommendations for a good starting point for a new arch.09:54
*** citypw has quit IRC10:02
pepijndevosugh... seems Ghidra somehow forgot the analysis data, so I guess I'll be working on something else...10:07
*** citypw has joined #yosys10:14
tux3Aw, I can't get Yosys to accept clocks in interfaces anymore, not even with hacky workarounds :/11:24
tux3>Handling const CLK on $memory$flatten\<snipped path>.mem[437]$55298 ($dff) from module top (removing D path).11:24
mwkhmm11:24
mwkcould you show an example?11:24
tux3I haven't minimized it, but essentially I use my AXI4-lite interface to talk to a small SRAM, so I have a `module my_sram(axi4lite.slave bus)`, and some master talks to it11:28
tux3Well, I did have this bug open https://github.com/YosysHQ/yosys/issues/1592 originally, but at the time I could workaround it by putting my clock in a modport11:28
tpbTitle: Clock in interface port mis-synthesized away (but accepted in modport) · Issue #1592 · YosysHQ/yosys · GitHub (at github.com)11:28
tux3It's probably not an easy fix or a recent regression, so I might just keep this project on the proprietary toolchains for a while more /shrug11:30
mwkjust to be sure, does it work with 4a05cad7f8a6ee57292e5360eb06305e13fc308b?11:31
mwkbecause it may indeed be interfaces, or it may be my screwup in refactoring the pass that detects const clocks in the first place11:32
mwkhmmm11:35
mwkas for the original bug, it seems to be an issue with opt_clean11:36
mwkyou're in luck, I have repairing this godforsaken pass on my immediate todo list11:37
tux3yay. still compiling11:37
tux3waiting on abc, it seems to take a lot longer on 4a05cad7f8a6ee57292e5360eb06305e13fc308b?11:39
tux3yosys is now using 9.5GB RAM and rising, starting to wonder if a BRAM turned into a reg =]11:43
tux3oh it's done. very different output/performance, but same result on 4a05cad7f8a6ee57292e5360eb06305e13fc308b11:45
mwk... weird11:46
tux3hhhhm11:46
tux3uh11:46
mwk... hmmm11:46
mwkso I fixed the clean issue, and it no longer nukes the submodule, but... clock is still not connected11:46
mwknot good11:46
tux3I'm saying it "failed" because i see  ICESTORM_RAM:    12/   32, which is what I had when my clock got removed11:46
tux3But uh, actually on 4a05 I have ICESTORM_LC: 138640/ 7680  1805%11:47
whitequarktry (*ram_block*)11:47
tux3So maybe it actually "worked!"11:47
whitequarkthis forces the memory into a BRAM or fails the build11:47
mwkokay so forget about the opt_clean thing11:48
mwkthat's a legitimate bug triggered by your testcase, but there clearly is another problem, in the SV frontend11:48
mwkand it was about (* keep *) wires getting removed, so nothing that would matter for synthesis11:48
tux3assuming (*ram_block*)  \n logic [data_width-1:0] mem [(1<<addr_width)-1:0];    is the correct syntax, synthetizing11:50
tux3ERROR: cell type '$mem' is unsupported (instantiated as 'foo.bram_rdata[1]_$mem_RD_DATA_4')11:50
mwk... that's not the best error message ever, but it does mean that blockram inference is failing11:51
tux3More info above in the log: https://paste.debian.net/1159736/11:54
tpbTitle: debian Pastezone (at paste.debian.net)11:54
mwk... we'd really have to look into the design11:55
tux3Happy to send it over if it helps, but fair warning the code is uh not very good11:56
tux3It's just a toy riscv core at about 7k lines11:57
whitequarkcompiler maintainers don't really care about code quality for the most part11:57
mwkwhatever it is, I've seen worse11:57
whitequarkalso that11:57
daveshahThe worse the code is the more bugs it usually finds :)11:58
mwk(and if not, I reserve the right to tell random people the war story)11:58
whitequarkat worst we might point out non-synthesizable constructs11:58
whitequarkbut as long as it's all valid synthesizable verilog i honestly can't be bothered even thinking whether it's elegant or not11:58
daveshahIn terms of Verilog, dodgy async stuff is often interesting from a finding weird edge cases point of view11:58
whitequarkright, that's a different point of view :)11:58
tux3well my testbenches pass, and avhdl is happy with it, but it's entirely possible I accidentally have nonsense verilog11:59
tux3et me tar something up with a Makefile that doesn't require my custom tools to build11:59
mwkfor completeness, I've opened https://github.com/YosysHQ/yosys/pull/2337 for the opt_clean issue affecting your example in #1592, but... this actually doesn't fix the main problem12:02
tpbTitle: opt_clean: Fix module keep rules. by mwkmwkmwk · Pull Request #2337 · YosysHQ/yosys · GitHub (at github.com)12:02
mwk(the device module & instantiation is now kept alive as expected, but the clock/reset lines are unconnected)12:03
tux3Okay here's a tarball: https://drive.google.com/file/d/13IuEX5lXghHEg0bkuVL0CTuMjyp-D8Gq/view?usp=sharing12:08
tux3I've kept the (*ram_block*) so this just fails to build, but the "correct" result is if nextpnr reports something like 28/32 BRAMS used (or, I guess, 1800% utilization)12:08
tux3the offending memory is at ./src/dev/axi4lite_sram.sv:4212:09
mwktux3: well at least the memory inference failure is quite obvious12:20
mwkyou have an asynchronous read port, so using blockram isn't possible12:20
mwkideally you should read synchronously from memory in the same module where it is defined, but yosys with flatten gives you a bit more freedom12:21
mwkit would be fine if you had an async read port directly feeding a register, but you have a problem at this line: wire [data_width-1:0] rdata = bram_rdata[bram_read_index];12:22
tux3oh right, I wanted to move the reg outside the bram module, I was hoping flatten would see it12:22
mwkthis inserts muxes between the $mem and the $dff, preventing merging it12:22
tux3Makes sense12:22
tux3I can just make my bus output comb and put it after the mux, then. Shouldn't be a problem12:23
tux3Thanks, I should have known this =]12:23
*** craigo has left #yosys12:24
mwkI don't quite understand what you want to do12:24
tux3I guess changing writing `.unregistered(0)` at line 44 is a good enough "fix" to clear that issue, even if it's technically wrong12:25
mwkyeah you'd have to do some pipeline fixing12:27
mwkalso I don't know why you're so carefully splitting the bram into bram_blocks, that's something yosys does for you12:27
tux3for my caches the block size is tied to tag size, addr space, etc, I had in mind to make it "portable" by having each arch set reasonnable parameters12:30
tux3not sure if that makes sense, I just got used to building things up from blocks and arch-specific params12:31
tux3I'm still not sure how the async ram "works" with 4a05cad but on master the whole module gets optimized out. But I guess keeping the clean synchronous ram works all the time, so I'll just do that.12:37
tux3I really appreciate the help, thank you. (And sorry that I don't have a more interesting bug to show for it!)12:38
*** _whitelogger has quit IRC12:48
*** _whitelogger has joined #yosys12:50
*** emeb has joined #yosys13:20
*** SpaceCoaster has quit IRC13:46
*** maartenBE has quit IRC14:08
*** maartenBE has joined #yosys14:10
pepijndevosdaveshah, I plotted wire delay for gowin, and it seems offset of 0.5 is pretty good, but the proportional part is more like 0.05, correct? https://ibb.co/bXTnSPL Y is ns, X is wire length14:30
tpbTitle: delay — ImgBB (at ibb.co)14:30
pepijndevosAssuming wire lenght is measured in grid units14:31
daveshahYes, that sounds believable14:35
daveshahIt is manhattan distance yeah14:35
pepijndevosok thanks14:36
pepijndevoshmmm, so gowin wires can be tapped at the ends and halfway, so I'm not really sure how that works timing wise. They don't list delays for e.g. 4 distance15:16
pepijndevosAlthough... I'm not sure but probably the actual wire length doesn't mater so much as the parasitics of the wire15:21
pepijndevoshuh... I'm confused... so there is a tile full of muxes that select from many inputs to a fixed output. Does the pip delay correspond to that output, or to the input it selects from?15:30
pepijndevosI'm assuming the former15:31
*** citypw has quit IRC15:50
*** kraiskil has joined #yosys18:05
*** thardin has quit IRC18:38
*** _whitelogger has quit IRC19:12
*** _whitelogger has joined #yosys19:14
*** kraiskil has quit IRC19:17
*** m4ssi has joined #yosys19:32
*** Asu has quit IRC19:44
awyglehas anybody tried replicating the numbers from table 3.20 in the ECP5 datasheet with nextpnr and/or diamond?20:10
awygleit's a list of "basic functions" and their "register-to-register performance"20:10
daveshahNope20:11
daveshahNot that I know of20:11
*** m4ssi has quit IRC20:13
awyglei got 207.3 MHz for a 64-bit adder as compared to the 441 MHz suggested in the datasheet, using nextpnr20:13
awygle(and yosys, which is probably more relevant now that i think about it20:14
awygleabc9 improves it slightly to 217.34 MHz20:19
daveshahIt might be related to register packing or something20:19
daveshahThere may also be issues with parts being pulled apart by the connections to the IO20:19
awyglecould be. i have inputs->reg x2->adder->reg->output20:20
daveshahThat is probably OK20:20
daveshahIs the speed grade the same as the datasheet?20:20
awygleyep, i'm running --speed 820:22
awygleand the table says "-8 timings"20:22
daveshahI see20:22
daveshahIt's probably suboptimal register placement for some reason20:22
daveshahBut quite frankly I have bigger things to worry about than microbenchmark performance20:23
awygleyeah, definitely20:23
awyglei just was curious about achievable speeds20:23
awyglei don't have a 64-bit wide datapath in my critical section anyway so it's more or less irrelevant20:23
daveshahThe register placement here is quite different to a real design anyway20:24
daveshahAs the influence from the IO is going to be much higher20:24
awygleyeah20:25
*** FL4SHK has joined #yosys20:25
awyglei'm gonna increase the "registers between this and the I/O" to like 16 each, and see if that changes anything20:26
awygleand then i'll be done playing with it20:26
daveshahYou could also try --out-of-context --placer sa to disable IO insertion (and work around the resulting probably singular matrix)20:27
awygleok, will do20:27
awyglethat got me to ~250 MHz20:28
awyglethat's close to what they have for "64 bit counter"20:28
awygleso i wonder if their 64-bit adder is actually a half-adder20:28
*** cr1901_modern has quit IRC20:29
daveshahYeah, a counter and adder shouldn't be so different20:29
awygleon a 64-bit counter i got 263 MHz, so yeah i think it's fair to say they're cheating20:30
awygleer, 269 MHz rather, so beating their claimed 263 MHz20:30
awygleoh daveshah, do you have any idea about the edgeclk vs sclk question i asked the other day? the TN for high-speed I/O says the SCLK topology must be used for frequencies <250 MHz and the eclk must be used for frequencies >400 MHz, does that mean i can do whatever in between those two?20:34
daveshahNo idea20:34
daveshahlitedram also uses edge clock at 100MHz+ fine20:34
awyglemk, i'll probably just design for eclk then20:35
daveshahI didn't know that rule was even a thing20:35
awyglemakes the timing a lot looser20:35
awygleoh but litedram is using the DQS stuff so it probably doesn't count come to think of it20:35
daveshahmaybe20:35
daveshahdepends what they mean by SCLK/ECLK topology really20:35
awygleit's section 5 of TN-02035-1.2 that i'm looking at, if you feel compelled to investigate20:36
awyglebut don't worry about it on my account20:36
*** emeb_mac has joined #yosys20:41
*** strongsaxophone has joined #yosys21:07
*** kristianpaul has quit IRC21:14
*** kristianpaul has joined #yosys21:15
*** cr1901_modern has joined #yosys21:24
*** kraiskil has joined #yosys21:26
*** kraiskil has quit IRC21:30
*** m4ssi has joined #yosys21:39
*** m4ssi has quit IRC22:00
*** cr1901_modern has quit IRC22:33
*** cr1901_modern has joined #yosys22:34
*** strongsaxophone has quit IRC22:41
*** m4ssi has joined #yosys22:53
*** m4ssi has quit IRC23:07
*** lf_ has quit IRC23:15
*** lf has joined #yosys23:16
*** emeb has quit IRC23:43

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!