Wednesday, 2020-06-10

*** tpb has joined #symbiflow00:00
sf-slack<pgielda> That will happen most probably tomorrow00:13
sf-slack<pgielda> Sorry for the delay, one of engineers that is responsible for this part is out of office this week by coincidence00:14
-_whitenotifier-f- [yosys] HackerFoo opened issue #79: Need updated Yosys to successfully run attosoc test with nextpnr-xilinx - https://git.io/JfyNj00:18
sf-slack<pgielda> But thanks for comments in the PR would be great tp get it in and then expand and fix things00:18
sf-slack<pgielda> We will also then have one set of packages for both xc7 and eos3 flows which would be a dream ;)00:18
HackerFoonextpnr takes 104s on vexrisc, where approximately all (all but 100ms) of that time is taken in fasm and bitstream generation: https://docs.google.com/document/d/1-lDeYYwmfxanod441FUjIoAKW061qmgKsqTcKTb_mXE/edit?usp=sharing00:46
*** andrewb1999 has quit IRC00:46
tpbTitle: Performance Comparison - Google Docs (at docs.google.com)00:46
HackerFooAt least according to fpga-tool-perf00:47
mithro@HackerFoo Did it actually generate anything? There is a lot of N/A output01:01
HackerFoomithro: It generated a 2MB .bit file; I haven't tried it.  fpga-tool-perf doesn't seem to keep the output from nextpnr yet, but it didn't seem to fail.02:02
HackerFooIf you want to try it, you need https://github.com/SymbiFlow/edalize/pull/47, https://github.com/HackerFoo/fpga-tool-perf/tree/nextpnr-vexriscv, and a recent yosys.02:05
HackerFooI re-ran it to capture the output: https://gist.github.com/HackerFoo/7b2a7bed7632a981399d54cba389d74902:15
tpbTitle: vexriscv/nextpnr-xilinx log · GitHub (at gist.github.com)02:15
HackerFooPlacement takes about 10 seconds.02:16
*** citypw has joined #symbiflow02:22
*** futarisIRCcloud has quit IRC02:23
*** rvalles_ has joined #symbiflow02:29
*** rvalles has quit IRC02:30
*** Degi has quit IRC02:31
*** Degi has joined #symbiflow02:33
*** rvalles_ is now known as rvalles03:22
*** az0re has quit IRC03:24
*** futarisIRCcloud has joined #symbiflow03:52
*** tpagarani has joined #symbiflow05:07
*** smkz has quit IRC05:28
*** smkz has joined #symbiflow05:31
*** smkz has quit IRC05:33
*** smkz has joined #symbiflow05:37
*** tpagarani has quit IRC06:17
*** lns1 has joined #symbiflow06:22
*** lns1 has left #symbiflow06:31
*** OmniMancer has joined #symbiflow06:37
*** OmniMancer1 has quit IRC06:39
*** az0re has joined #symbiflow06:46
*** epony has quit IRC07:53
*** epony has joined #symbiflow07:58
*** epony has joined #symbiflow08:01
*** kraiskil has joined #symbiflow08:46
sf-slack<acomodi> nextpnr in tool-perf needs to be fixed to have all the output results correctly parsed, I am dealing with that now08:57
*** mkru has joined #symbiflow09:42
*** mkru has quit IRC10:03
*** lnsharma has joined #symbiflow10:33
*** lnsharma has left #symbiflow10:35
*** craigo has joined #symbiflow10:54
Loftytnt: you around?11:13
tntLofty: I am11:13
LoftyCould you send me your designs so I can play about with them?11:13
tntLofty: sure. http://people.osmocom.org/~tnt/stuff/s3/s3-usb.tar.bz211:17
tntNote that you can only run it through yosys atm ... you can't actually run VPR because I haven't ported the RAM or IO so you have SB_IO and SB_RAM_4K blackboxes ...), nor worked out the proper interconnect.11:18
*** anuejn_ is now known as anuejn11:22
tntBut so far it's been good enough for me to look at the deepest path with 'show' (something like show n:$abc$8664$techmap\tx_pkt_I.$0\len[10:0][6] %ci*:-dff:-dffc:-SB_RAM40_4K ) and see if I can do better.11:23
LoftyIf ABC9 gets hooked up there'll be sta, which is even more useful11:24
Loftytnt: Is this after you've manually optimised it, or before?11:32
tntWhat I sent ? that's before I did anything.11:37
tntI didn't actually do any optimization anywhere else than on paper ...11:38
LoftyAh, okay11:43
LoftyLet's start by seeing what happens if we map muxes before giving them to ABC11:47
LoftySince to implement a MUX8 in pure-LUT requires 11 inputs, and ABC thinks we have 411:49
*** kraiskil has quit IRC11:52
LoftyDoesn't seem like you have anything mappable there. At least with `muxcover -nodecode`11:54
LoftyLet's see what happens if we allow decoders11:54
LoftyAgain, nothing11:55
LoftyAh well11:55
tntUnfortunately they're probably never pure-mux. The 8:1 mux example is just like a synthetic worst case.  For the few path I looked at it's more subtle that you can use the muxes better.11:56
LoftyAlright, 1467 cells to 1261 cells by using `abc -luts 1,2,2,4`11:57
LoftyLet's use a crappy formula for LC usage: LCs = LUT2/2 + LUT3/2 + LUT411:58
LoftyBefore: 385/2 + 509/2 + 56 = 407 LCs11:59
LoftyAfter: 107/2 + 420/2 + 217 = 498 LCs.12:01
LoftyHmm12:01
tnt How's the ltp ?12:01
Loftylength=712:01
LoftySo equivalent to before it seems12:02
LoftyOkay, what happens if I reduce the area cost of LUT4s to make it try to cover more logic with it?12:03
Lofty109/2 + 389/2 + 238 = 486 LCs12:04
*** kraiskil has joined #symbiflow12:04
LoftyThat's `abc -luts 1,2,2,3`12:05
LoftyI just double-checked with flowmap: it seems the critical path is 7 and it can't actually be shortened12:07
LoftyWhat if we make LUT4s the same cost as LUT3s?12:11
Lofty87/2 + 322/2 + 305 = 509 LCs12:12
LoftyHrm.12:12
LoftyWhat if we make LUT4s unnecessarily expensive?12:13
tntI'll try to come up with a more synthetic example that's easier to analyze and that comes up quite a bit. ( wich is basically a loadable counter with enable ).12:13
LoftySure12:13
tntFor instance, it doesn't _use_ dff enable from what I've seem, which increase the length of the logic for nothing.12:13
LoftyThat's because the flow doesn't map dff enable12:14
LoftyWhich might help :P12:14
LoftyI have no idea what the primitive for it is12:17
tntdffe ( Q, D, CLK, EN );12:18
LoftyOkay, so literally just inserting a dff2dffe in the flow pulls out 191 dffes12:20
Lofty<Lofty> Before: 385/2 + 509/2 + 56 = 407 LCs12:27
LoftyAfter: 472/2 + 375/2 + 46 = 46912:27
LoftyEh?12:27
Lofty...Ah12:28
LoftyHere's the answer: 385/2 + 509/2 + 56 = 502 LCs :P12:28
LoftyThanks windows calculator12:29
LoftyReal helpful of you12:29
tntlol12:29
LoftyOkay, so, dffes help12:31
LoftyLet's go back to LUT mapping with the math problems solved12:36
Lofty186/2 + 345/2 + 178 = 443 LCs (abc -luts 1,2,2,4)12:37
LoftySo I've shaved 60 LCs by configuring ABC better and using DFFEs12:41
*** futarisIRCcloud has quit IRC12:43
*** citypw has quit IRC12:44
LoftySurprisingly, this design isn't too awful to run `freduce` on12:45
Lofty200/2 + 291/2 + 168 = 413 LCs12:45
LoftyAlthough maybe put that behind an option :P12:46
tntWhat's freduce ?12:49
LoftyIt looks for functionally-equivalent subcircuits (e.g. subcircuits that return the same value) to reduce area12:49
LoftyUnfortunately the pass is horrifically slow for anything larger than toy examples12:50
LoftyBecause it attempts to SAT compare stuff without even trying12:50
*** olegfink1 has joined #symbiflow12:54
LoftyHmmm12:55
LoftyWithout going to the effort of writing an ABC9 flow, I think I've mostly exhausted the things I can think of12:56
*** smkz has quit IRC12:58
LoftyActually, there is one12:58
LoftyNope, doesn't apply here.12:59
*** xobs1 has joined #symbiflow13:00
*** FFY00 has quit IRC13:01
*** promach3 has quit IRC13:01
*** xobs has quit IRC13:01
*** olegfink has quit IRC13:01
LoftyAnyway, 60 LCs by configuring ABC better, another 30 by using freduce13:02
*** FFY00 has joined #symbiflow13:03
tntDo you have a diff I can apply to yosys to try that here ?13:04
tntDid the dffe have any impact on ltp btw ?13:05
Loftyhttps://gist.github.com/ZirconiumX/ce9a6e4cc286294d9fd56fa04c103c4b13:06
tpbTitle: synth_quicklogic.diff · GitHub (at gist.github.com)13:06
LoftyNo, but it reduces area13:06
LoftyAs I said: you can't do any better with depth13:06
LoftyAt least without resynthesis13:06
tntI'm also wondering why the flow is two steps of mapping.  synth_quicklogic is the first stage but then they actually re-run yosys a second time with a different techmapping pass.13:08
*** smkz has joined #symbiflow13:09
Lofty?!13:09
LoftyThat's...what.13:09
tnthttps://pastebin.com/RHmrqy0n13:11
tpbTitle: synth.sh: yosys -p "tcl ${SYNTH_TCL_PATH}" -l $LOG ${VERILOG_FILES[*]} pytho - Pastebin.com (at pastebin.com)13:11
LoftyThis should be within synth_quicklogic13:12
tntYeah, they have a second level cells_map.v that maps to T_FRAG / B_FRAG and other types of cells used by vtr rather than lut{2,3,4}.13:15
* Lofty sighs13:15
tnthttps://pastebin.com/qjLFp01T13:17
tpbTitle: [VeriLog] // ============================================================================ - Pastebin.com (at pastebin.com)13:17
tntIt seems to also have stuff like "// FIXME: Always select QDI as the FF's input" which seems rather sub-obtimal ... or maybe vpr's packing step can 'fix' that if it sees it can pack the Q FRAG with the T/B FRAG13:18
tntAlso the LUT4 are mapped to 2 LUT3 + 1 F_FRAG. So not sure why that is. I'm hoping the F_FRAG can also be placed by VTR as the TBS_mux and not just the bottom Fmux because that seems like a huge waste.13:20
LoftyMmm13:23
*** az0re has quit IRC13:28
Loftytnt: But presumably shaving 20% off the area helps a bit, right?13:31
tntIt helps for sure, but atm I'm more concerned about the fmax / depth.13:35
tntI have to finish porting so I can make a full run synth->pnr to see the actual frequency. Less area will help placement for sure.13:36
LoftyI'm fairly sure ABC9 will help a lot here13:37
LoftyBy the way, tnt, I added some other stuff you'll probably find helpful13:41
tntwhere ?13:42
LoftyI just updated the diff I gave earlier13:43
LoftySorry, Git was being a pain13:44
LoftyYou'll now see the names of wires in the LTP output13:44
LoftyShould help your optimisation efforts13:44
tntAh yeah, naming :)13:44
tnttx13:45
tntLofty: it might not solve the longest one, but still provides a good reduction : https://pastebin.com/Swhjs6Gn14:01
tpbTitle: Paths length distribution: Before After 0: 270 307 1: 19 - Pastebin.com (at pastebin.com)14:01
tnt(the method I use to collect them is flawed, but I still it still gives a decent idea ...)14:01
LoftyYay14:01
LoftyThat should improve routability a bit then14:02
Loftyi.e. less shitty runtime14:02
*** futarisIRCcloud has joined #symbiflow14:07
*** kraiskil has quit IRC14:31
*** kraiskil has joined #symbiflow14:46
sf-slack<acomodi> mithro, HakerFoo: I have merged a fix in tool-perf to parse nextpnr results (frequency, runtime and resources)15:12
*** OmniMancer has quit IRC15:19
*** OmniMancer has joined #symbiflow15:20
*** epony has quit IRC15:32
*** mangelis has quit IRC15:42
*** craigo has quit IRC15:46
*** kraiskil has quit IRC15:52
*** mkru has joined #symbiflow16:26
*** mangelis has joined #symbiflow16:30
*** mkru has quit IRC16:42
mithroLofty: awesome work on the synthesis stuff16:54
LoftyIt was a handful of hours that I could spare16:54
mithroLofty: did you see that QuickLogic provide full liberty files which include timing for all the parts?16:55
mithroLofty: https://github.com/QuickLogic-Corp/EOS-S3/tree/master/Timing%20Data%20Files16:55
tpbTitle: EOS-S3/Timing Data Files at master · QuickLogic-Corp/EOS-S3 · GitHub (at github.com)16:55
LoftyI did, yes16:56
LoftyBut porting to ABC9 is a sufficiently big enough task that I'm not going to do it for free16:57
LoftyHousehold expenditures, etc16:58
sf-slack<acomodi> litghost, HakerFoo: I have an almost working lookahead. I have merged the lookahead creation from the connection box and used the SRC/OPIN --> CHAN and CHAN --> IPIN information from the upstream lookahead map17:00
litghostacomodi: Define almost working?  It routes well?17:01
sf-slack<acomodi> almost working because it still take more time than the connection box lookahead (271 seconds vs ~130 seconds)17:01
sf-slack<acomodi> I still need to check whether the bitstream works though, but the routing  step completed successfully17:01
litghostacomodi: How long did the untuned version take?17:02
sf-slack<acomodi> By untuned you mean without the information on SRC/OPIN --> CHAN and CHAN --> IPIN?17:02
sf-slack<acomodi> If that's the case it took 313 seconds17:03
sf-slack<acomodi> Also, the lookahead generation took 473.11 seconds with NUM_WORKERS set to 2017:03
litghostacomodi: I assume this is on the A50 fabric?17:04
sf-slack<acomodi> Yes17:04
sf-slack<acomodi> I still need to do more testing and probably fix some things, but I think we are on the right path17:05
litghostacomodi: 313 -> 271 isn't a huge improvement, but it is moving in the right direction.  It is likely worth comparing how and when the lookahead mispredictions occur.  I expect you can use that to inform how to iterate17:05
*** az0re has joined #symbiflow17:27
*** epony has joined #symbiflow17:35
*** kraiskil has joined #symbiflow17:46
*** mkru has joined #symbiflow17:57
sf-slack<acomodi> Sure, I'll try and get it down to reasonable run-time. Hopefully matching the current lookahead. I have tested the routed design on HW and it works18:17
litghostacomodi: The runtime doesn't have to match exactly, but 271 vs 130 is a large enough difference that points to the refactored map lookahead still mispredicting pretty badly18:18
litghostacomodi: Identifying how the new map lookahead mispredicts will enable targeted development to reduce the mispredicts18:19
litghostacomodi: Another thing I expect is the new map lookahead output file to be significantly smaller than the connection box lookahead.  Is that true?18:21
sf-slack<acomodi> Indeed, here there is a pretty huge benefit. • refactored lookahead: 17M • connection box lookahead: 557<18:28
sf-slack<acomodi> *M18:28
sf-slack<acomodi> I will need to run more detailed routes, maybe there are some specific nets that encounter lots of mispredicts18:29
litghostacomodi: That is really good!  Hopefully with some further development we can capture the relevant data to make the map lookahead as good as the connection box lookahead, without the extra data18:29
litghostacomodi: When I was doing initial development for the connection box lookahead, I enabled router profiling, and made sure to print per-connection route times per iteration, and printed the worst connection (OPIN -> IPIN) times18:30
litghostacomodi: From there I used the routing_diag tool at criticality == 1 to find lookahead mispredictions18:31
litghostacomodi: You may find that you will need to have several connections to suss out where the mispredictions were coming from18:31
litghostacomodi: However it the lookahead is close, you will likely see a step function somewhere in the path that points to the current error source18:31
litghostacomodi: Another thing is when you are examining just 1 connection, you can easily compare the ideal route (e.g. astar_fac = 0) versus the route at various A* levels18:32
litghostacomodi: With a well tuned lookahead, the router should behave similiarly at A* = 0 and A* ~= .1 - .518:33
litghostacomodi: Once that is working well, A* = 1 should return a good route quickly, and A* = 1.05 ~ 1.2 should return a route quickly, but with somewhat worse path delay18:33
litghostacomodi: Ideally A* <= 1 should all return best (or close to best) delay, and A* > 1 < 1.2 should return worse path delay, but with increased speed (via decreased search space)18:34
sf-slack<acomodi> litghost: All right, thanks for the insight, I will get to analyze all of this on smaller circuits then, to keep things simple18:37
litghostacomodi: No, my suggestion was not to use a smaller circuit, but to use a circuit, and pick the worst connection route time, and debug that connection specifically18:37
sf-slack<acomodi> One question, by router profiling you mean to have the debug logging enabled or is that something else entirely?18:37
litghostacomodi: https://github.com/SymbiFlow/vtr-verilog-to-routing/blob/master%2Bwip/vpr/src/route/route_profiling.h#L718:38
tpbTitle: vtr-verilog-to-routing/route_profiling.h at master+wip · SymbiFlow/vtr-verilog-to-routing · GitHub (at github.com)18:38
litghosthttps://github.com/SymbiFlow/vtr-verilog-to-routing/blob/master%2Bwip/vpr/src/route/route_profiling.cpp#L23318:38
tpbTitle: vtr-verilog-to-routing/route_profiling.cpp at master+wip · SymbiFlow/vtr-verilog-to-routing · GitHub (at github.com)18:38
sf-slack<acomodi> Ok, great, thanks18:39
*** mkru has quit IRC18:42
*** mkru has joined #symbiflow18:53
*** futarisIRCcloud has quit IRC19:07
*** kraiskil has quit IRC19:45
*** kraiskil has joined #symbiflow19:48
*** tonlage has joined #symbiflow20:54
*** OmniMancer has joined #symbiflow21:15
*** andrewb1999 has joined #symbiflow21:25
andrewb1999Is there a reason that the wire CLK_HROW_TOP_R_X60Y130/CLK_HROW_CK_BUFHCLK_L7 would exist in the database but not CLK_HROW_TOP_R_X60Y130/CLK_HROW_CK_BUFHCLK_L6?21:30
andrewb1999Both of these wires were chosen by Vivado as partition pins, but and the xray python lookup scripts can find L7 but not L621:32
andrewb1999All other partition pins can be found fine too21:32
andrewb1999Or is it possible this is an issue with using 2 clocks?21:35
andrewb1999Does symbiflow support multiple clocks yet?21:35
litghostandrewb1999: 7-series VPR does support multiple clocks, but it doesn't currently support explicit BUFH instancing21:36
litghostandrewb1999: There is not support for fully route-through sites right now in VPR21:37
litghostandrewb1999: You can operate with all explicitly BUFH instancing, or all implicit BUFH instancing (via routing)21:37
litghostandrewb1999: Mixing them is not supported at this time21:38
litghostandrewb1999: Support could be added, but it is unclear why it would be a priority21:39
andrewb1999litghost: Ok thanks, will probably switch to one clock now for simplicity21:40
litghostandrewb1999: FYI, One vs multiple clocks has no revelance on the BUFH discussion21:40
andrewb1999litghost: Oh ok21:41
*** futarisIRCcloud has joined #symbiflow21:41
*** az0re has quit IRC21:51
*** az0re has joined #symbiflow22:07
*** gsmecher has quit IRC22:17
*** kraiskil has quit IRC22:18
andrewb1999Is the difference between graph_limit and roi just whether synth_tiles are created and if fasm from the harness gets merged?22:23
andrewb1999Or does it get treated differently in other ways?22:23
litghostandrewb1999: The ROI provides the graph limit as part of the ROI spec (along with the FASM for the bits for the harness)22:28
litghostandrewb1999: The graph_limit is for ROI-less designs, where we want a sub-graph excluding some of the fabric22:29
*** tonlage has quit IRC23:07

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!