*** tpb <[email protected]> has joined #symbiflow | 00:00 | |
tpb | <bl0x_> Hi, I've just got hit by the fact that I forgot to assign a suitable clock to an instantiated module's clock input (in this case xc7 MMCME2_ADV). Symbiflow was happy with it, Vivado complained about it during DRC. Is that a symbiflow/yosys issue? How is DRC done in symbiflow/yosys? | 01:28 |
---|---|---|
tpb | <lkcl> bl0x_, hi - there's a lot of things missing from the FOSS HDL tools at the moment, which if you're used to proprietary ones will seem strange that they're lacking. over time this will get better, if there's proper funding and/or resources, you know how that works with FOSS :) | 05:21 |
tpb | <lkcl> i've been working with VLSI ASIC/FPGA for about 3-4 years now, and DRC and error-checking in general i've found is quite sparse. personally i've substituted huge quantities of unit tests for that lack, and that's worked well for me | 05:24 |
tpb | <lkcl> someone else on here may have had different experiences and know of some Libre DRC tools? i'd be interested to know if they exist, as well | 05:26 |
tpb | <sf-slack> <acomodi> lkcl: I believe that you bumped in a current limitation of VPR regarding the direct connections between macro blocks such as carry chains which COUT - CIN connections do not enter the general interconnect and require definition of direct inter-block connections (https://docs.verilogtorouting.org/en/latest/arch/reference/#direct-inter-block-connections) | 08:14 |
tpb | <tpb> Title: Architecture Reference — Verilog-to-Routing 8.1.0-dev documentation (at docs.verilogtorouting.org) | 08:14 |
tpb | <sf-slack> <acomodi> these connections are currently present for the carry chains, but due to some slight variation in the xc7 fabric, the offset between the carry-chains "y" location might be of two units rather than one, I believe when the carry-chain has to "jump" over the clock spine in the middle of a clock region | 08:16 |
tpb | <sf-slack> <acomodi> I think that VPR should be enhanced to have multiple ways of building up a macro cluster (where macro cluster is a cluster of blocks that have direct connections, such a set of carry-chains connected one to the other). Currently IIRC there is one possible inter-block direct definition that can be defined for two block pins (COUT - CIN) | 08:19 |
tpb | <sf-slack> <acomodi> For the time being, having a carry chain longer than 25 (which should be the maximum length of carry chain blocks that have a one-length "jump" between one another instead of the less common two-length one), should result in the error you reported | 08:22 |
tpb | <sf-slack> <acomodi> @fahrenkrog: IIRC latches primitives such as LDCE are not currently supported (and in general they should be avoided). There was a PR opened a while ago to add support for those: https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1417 | 08:24 |
tpb | <bl0x_> lkcl: Sure, I am not complaining about symbiflow/yosys! On the contrary =) But this reminded me to always cross-check with a Vivado run if things are really OK. Also, if I know where to look, then I can check how to contribute whenever something seems "strange" or "lacking". | 09:20 |
tpb | <lkcl> bl0x_, no, me neither, i mean we wouldn't be able to do anything without the work done by everyone on these tools. if you were developing an ASIC, i could refer you to klayout, apparently someone put together some DRC checking for ASIC development. | 11:47 |
tpb | <bl0x_> lkcl: oh, but no. I'm doing FPGA for the time being. | 12:05 |
tpb | <bl0x_> I've seen that some basic checks are done in techmap. Perhaps that can be enhanced. | 12:08 |
tpb | <sf-slack> <fahrenkrog> @acomodi Thanks, I will have a read through that and see how to improve the design. Cheers! | 12:08 |
tpb | <lkcl> acomodi: thanks for responding. i'm not yet familiar enough with the internals of vtr to understand fully there, but intuitively i get it (and that you have a handle on the problem). | 12:48 |
tpb | <lkcl> so, to help me understand: is this because the (large) design i am doing has an above-average number of large add/sub/cmp operations? or is carry-chains a general-purpose block for propagating large inter-connected combinatorial circuits? | 12:50 |
tpb | <lkcl> for example, if we have a global pipeline "stall" signal that connects to 10 separate (rather large) pipelines, or an MMU with a TLB that ended up being a (massive) combinatorial circuit? | 12:52 |
tpb | <sf-slack> <acomodi> I believe is the former, probably your design uses quite large add/sub/cmp operations that lead synthesis to generate a very long carry-chain (>25 carry blocks) that VPR cannot handle at this point. A possible workaround might be to cut the carry-chain in sub-carry-chains during the synthesis step | 12:58 |
tpb | <lkcl> ahh yeah we have a divide unit which can do square-root and reciprocal-square-root. for 64-bit outputs that needs 3x 64-bit for the divisor so 192-bit add/sub | 13:14 |
tpb | <lkcl> as that's currently not needed (Power ISA integer ops don't have sqrt and certainly not rsqrt) i think we can get away with cutting that out | 13:14 |
tpb | <lkcl> thank you! | 13:14 |
tpb | <lkcl> if however we wanted to cut the carry-chain, how would i go about identifying it? what am i looking for? | 13:17 |
tpb | <lkcl> ah that's interesting | 13:20 |
tpb | <sf-slack> <acomodi> So, first thought would be to add a step that identifies too-long-carry-chains and splits them to this script https://github.com/SymbiFlow/symbiflow-arch-defs/blob/master/utils/fix_xc7_carry.py | 13:21 |
tpb | <lkcl> https://github.com/SymbiFlow/symbiflow-arch-defs/blob/master/utils/fix_xc7_carry.py | 13:23 |
tpb | <lkcl> yes lol i was just trying to find the source for that :) | 13:23 |
tpb | * lkcl just having a look at clean_carry_out.v to get a handle on this | 13:26 |
tpb | <lkcl> acomodi: okaaay i've got a mini repro design (in nmigen) which creates a 192-bit adder | 14:44 |
tpb | <lkcl> and can see that fix_xc7_carry.py is being run a bit too early | 14:44 |
tpb | <lkcl> as in: it's run on the design *before* any CARRY4_VPR blocks are present | 14:46 |
tpb | <lkcl> the files go through a processing stream, original top.v to json, then that's in to fix_xc7_carry.py, then the output from that goes into abc9 | 14:48 |
tpb | <lkcl> but i think it should have been run _after_ the abc9 processing | 14:49 |
tpb | <lkcl> i'll try it manually, see what happens | 14:49 |
tpb | <lkcl> ok i've a mini-script which repros the techmap fixups, which allows the port_direction attributes to be inserted into the post-abc9 .v file... | 15:14 |
tpb | <lkcl> getting this error: | 15:14 |
tpb | <lkcl> File "fix_xc7_carry.py", line 317, in find_carry4_chains | 15:14 |
tpb | <lkcl> assert direct_cell["type"] == "CARRY_CO_DIRECT", direct_cellname | 15:14 |
tpb | <lkcl> AssertionError: _296_ | 15:14 |
tpb | <lkcl> actual type is 'CARRY4_VPR', | 15:15 |
tpb | <lkcl> ermermermermerm... i think i know what this is | 15:20 |
tpb | <lkcl> https://github.com/SymbiFlow/symbiflow-arch-defs/blob/8a7fc2ff272222641fc5c887ca7d81a949c62bae/utils/fix_xc7_carry.py#L313 | 15:20 |
tpb | <lkcl> suspect that should be [1] not [0] | 15:20 |
tpb | <lkcl> ngggh nope. getting a little lost here. | 15:31 |
tpb | <lkcl> ok got it - back to the pre-fixup json, there it is. | 15:37 |
tpb | <sf-slack> <acomodi> So, for context, the techmapping forces the creation of the `CARRY_CO_DIRECT` and `CARRY_COUT_PLUG` additional cells which are required to force the VPR packer to correctly use the `CIN - COUT` pins for the direct connection | 15:38 |
tpb | <lkcl> yes, am up to speed, just putting in some debug prints in | 15:40 |
tpb | <lkcl> a 192-bit adder creates a chain of length 48 | 15:41 |
tpb | <sf-slack> <acomodi> The idea here would be to iterate over a carry-chain starting from its head and, upon reaching the limit, remove the following `CARRY_CO_DIRECT` and `CARRY_COUT_PLUG` and restore the original connection coming out of the original `synth_xilinx` pass, so that, at VPR eyes, the carry-chain can be packed as a 25 and 23 length chains | 15:41 |
tpb | <lkcl> what type of cell would need to be put in, here? | 15:41 |
tpb | <lkcl> mmmm ahh ok | 15:41 |
tpb | <lkcl> mrhm i should be able to find that file (original synth pass) | 15:42 |
tpb | <lkcl> "symbiflow-arch-defs/xc/xc7/yosys/synth.tcl" | 15:42 |
tpb | <lkcl> in there somewhere | 15:42 |
tpb | <lkcl> ah, you mean the {name}.premap.v file? | 15:43 |
tpb | <sf-slack> <acomodi> I think you have all the information in that script, and basically you might just connect the `carry_co_direct` input to the `carry_cout_plug` output | 15:43 |
tpb | <lkcl> been doing python for 20 years now, but the data format's new to me. let me do some more debug prints - i can see what fixup_cin is doing so that establishes a baseline/principle | 15:46 |
tpb | <lkcl> so just to check: does the {top}.carry_fixup.json file already have CARRY_CO_DIRECT connected directly to CARRY_COUT_PLUG? | 15:58 |
tpb | <lkcl> (graphviz - yosys show top - is taking several *minutes*, otherwise i'd be able to see it myself) | 15:58 |
tpb | <lkcl> i need to start from a smaller example - 192 bits is too large, taking far too long | 16:00 |
tpb | <sf-slack> <acomodi> tbh I'd need to navigate through the carry chain handling, as I am not too familiar myself with it, I might have mistaken and there are actually only the additional carry_cout_plug. I suggest starting from an 8 bit counter which is enough to understand what is going on there | 16:01 |
tpb | <lkcl> yes, i reduced down to 16-bit and it's still 100% CPU... oh hooray! graphviz completed after 2 minutes :) | 16:04 |
tpb | * lkcl restarting with 8-bit | 16:05 |
tpb | <lkcl> that's better. only about 15 seconds. whew | 16:06 |
tpb | <lkcl> rrright. ran the xc7_fixup.py on an 8-bit add and it doesn't actually look like any change was made, hm. | 16:09 |
tpb | <lkcl> top_synth.v.premap.v contains just CARRY4 blocks. | 16:12 |
tpb | <sf-slack> <acomodi> Yeah, the `CARRY4_VPR` are generated after mapping, so the script should run on the already-mapped netlist | 16:14 |
tpb | * lkcl wondering if dropping in an IBUF would do the trick | 16:14 |
tpb | <lkcl> nope those are for pads, it looks like | 16:15 |
tpb | <lkcl> a pair of INVs back-to-back? | 16:15 |
tpb | <lkcl> (yuk!) | 16:16 |
tpb | <lkcl> is there a non-inverting cell? | 16:17 |
tpb | <lkcl> BUFG with an "EN" set to 1 would do it | 16:21 |
tpb | * lkcl thinking of changing random CARRY_OUT_PLUGs to BUFGs | 16:23 |
tpb | * lkcl afk | 16:23 |
tpb | <lkcl> The clock net '$auto$alumacc.cc:485:replace_alu$1437.CO[3]' driving 'BUFGCTRL_VPR' sources at logic which is not allowed! | 17:28 |
tpb | <lkcl> grr no, not allowed | 17:28 |
tpb | <lkcl> MUXCY is a virtual cell (mapped) | 17:48 |
tpb | <lkcl> let's try a MUXF6... | 17:54 |
tpb | <lkcl> with a hard-coded S=0 | 17:55 |
tpb | <lkcl> blif file was accepted into vtr... | 18:04 |
tpb | <lkcl> damnit that didn't work either | 18:32 |
tpb | <lkcl> Cannot route from BLK-TL-CLBLM_R[0].CLBLM_M_AMUX[0] (RR node: 5365158 class: 82 capacity: 1 fan-in: 0 fan-out: 1 SOURCE:5365158 (134,72)) to BLK-TL-CLBLM_R[0].CLBLM_L_CIN[0] (RR node: 5365097 class: 21 capacity: 1 fan-in: 1 fan-out: 0 SINK:5365097 (134,72)) -- no possible path | 18:32 |
tpb | <lkcl> i think something more sophisticated (earlier) might be needed here | 18:33 |
tpb | <lkcl> to rework adds of an excessive length to ones that are explicitly say 128-bit (max) but multiple of them, chained together with a single bit-carry | 18:34 |
tpb | <lkcl> done at the verilog level (or more like yosys ilang) | 18:35 |
*** maartenBE <maartenBE!~maartenBE@freenode/user/maartenBE> has quit IRC (Ping timeout: 120 seconds) | 19:27 | |
*** maartenBE <[email protected]> has joined #symbiflow | 19:28 | |
tpb | <lkcl> acomodi: ah ha! a potential temporary workaround | 23:40 |
tpb | <lkcl> log(" -nocarry\n"); | 23:41 |
tpb | <lkcl> log(" do not use XORCY/MUXCY/CARRY4 cells in output netlist\n"); | 23:41 |
tpb | <lkcl> in yosys techlibs/xilinx/synth_xilinx.c | 23:42 |
tpb | <lkcl> i'm tempted to suggest the idea of a new techmap, either replacing yosys/share/xilinx/arith_map.v $alu map | 23:49 |
tpb | <lkcl> which spots if the length of the $alu is greater than 90 or so bits | 23:50 |
tpb | <lkcl> and does the split there, with a manual carry of one extra bit and some intermediary signals | 23:51 |
tpb | <lkcl> ok that successfully maps to a batch of LUT4/5/6s (using synth_xilinx -nocarry) | 23:54 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!