*** tpb has joined #yosys | 00:00 | |
*** emeb has quit IRC | 00:14 | |
*** emeb_mac has joined #yosys | 00:17 | |
*** proteusguy has quit IRC | 01:04 | |
*** gsi__ has joined #yosys | 01:06 | |
*** gsi_ has quit IRC | 01:08 | |
*** rektide has joined #yosys | 01:34 | |
*** proteusguy has joined #yosys | 02:09 | |
*** AlexDaniel has quit IRC | 02:25 | |
*** zignig has joined #yosys | 02:29 | |
*** citypw has joined #yosys | 02:39 | |
*** adjtm has joined #yosys | 02:47 | |
*** adjtm_ has quit IRC | 02:49 | |
*** proteusguy has quit IRC | 02:50 | |
*** PyroPeter has quit IRC | 02:53 | |
*** proteusguy has joined #yosys | 03:03 | |
*** PyroPeter has joined #yosys | 03:06 | |
*** adjtm has quit IRC | 03:26 | |
*** adjtm has joined #yosys | 03:26 | |
*** X-Scale has quit IRC | 03:38 | |
*** vonnieda has joined #yosys | 03:55 | |
bwidawsk | is there like a goto riscv I should be working with? I'd bee looking at the Wishbone VexRiscv | 04:06 |
---|---|---|
sorear | are you looking for anything in particular? | 04:08 |
bwidawsk | sorear› just looking for something that will synthesize with diamond and yosys, and is "complex" to do a comparison | 04:09 |
*** rohitksingh has joined #yosys | 04:23 | |
*** rohitksingh has quit IRC | 04:35 | |
*** rohitksingh_work has joined #yosys | 04:50 | |
*** _whitelogger has quit IRC | 06:05 | |
*** _whitelogger has joined #yosys | 06:07 | |
corecode | ac | 06:33 |
corecode | woops | 06:33 |
*** emeb_mac has quit IRC | 06:39 | |
*** voxadam has quit IRC | 06:44 | |
*** voxadam has joined #yosys | 06:45 | |
*** dys has quit IRC | 07:02 | |
*** gsi__ is now known as gsi_ | 07:17 | |
*** fsasm_ has joined #yosys | 07:27 | |
*** dys has joined #yosys | 07:32 | |
*** m4ssi has joined #yosys | 07:37 | |
*** citypw has quit IRC | 07:41 | |
*** citypw has joined #yosys | 07:42 | |
*** fsasm_ has quit IRC | 08:41 | |
*** vidbina has joined #yosys | 09:06 | |
*** rohitksingh has joined #yosys | 09:12 | |
*** dys has quit IRC | 09:27 | |
*** vidbina has quit IRC | 09:41 | |
*** AlexDaniel has joined #yosys | 09:46 | |
*** dys has joined #yosys | 09:47 | |
*** AlexDaniel has quit IRC | 10:23 | |
*** citypw has quit IRC | 11:00 | |
*** citypw has joined #yosys | 11:04 | |
*** jakobwenzel has quit IRC | 11:43 | |
*** jakobwenzel has joined #yosys | 11:44 | |
ZirconiumX | So, adding DFFSRs (in the form of the only AC-family chip I could find, the 74AC11074) actually reduces the overall chip count, despite the '74 only having 2 DFFSRs | 11:50 |
ZirconiumX | (thanks daveshah) | 11:51 |
*** fsasm has joined #yosys | 12:06 | |
*** shorne has joined #yosys | 12:13 | |
*** AlexDaniel has joined #yosys | 12:21 | |
*** rohitksingh has quit IRC | 12:24 | |
*** proteusguy has quit IRC | 12:32 | |
*** rohitksingh has joined #yosys | 12:35 | |
*** AlexDaniel has quit IRC | 12:44 | |
*** rohitksingh has quit IRC | 12:47 | |
*** citypw has quit IRC | 13:11 | |
*** rohitksingh has joined #yosys | 13:13 | |
*** rohitksingh has quit IRC | 13:30 | |
*** X-Scale has joined #yosys | 13:33 | |
*** citypw has joined #yosys | 13:44 | |
*** vonnieda has quit IRC | 13:48 | |
*** rohitksingh_work has quit IRC | 13:52 | |
*** fsasm has quit IRC | 13:53 | |
*** fsasm has joined #yosys | 13:59 | |
*** m4ssi has quit IRC | 14:44 | |
*** emeb has joined #yosys | 14:58 | |
*** AlexDaniel has joined #yosys | 15:03 | |
*** rohitksingh has joined #yosys | 15:06 | |
*** unkraut has quit IRC | 15:08 | |
*** unkraut has joined #yosys | 15:10 | |
ZirconiumX | daveshah: My adder techmap pass seems to produce more Yosys warnings. Mind telling me how I fucked up this time? | 15:23 |
ZirconiumX | https://github.com/ZirconiumX/74xx-liberty/blob/master/74_adder.v | 15:23 |
tpb | Title: 74xx-liberty/74_adder.v at master · ZirconiumX/74xx-liberty · GitHub (at github.com) | 15:23 |
ZirconiumX | ../74_adder.v:32: Warning: Range [3:0] select out of bounds on signal `\AA': Setting 1 MSB bits to undef. | 15:24 |
ZirconiumX | ../74_adder.v:33: Warning: Range [3:0] select out of bounds on signal `\BB': Setting 1 MSB bits to undef. | 15:24 |
daveshah | ZirconiumX: AA needs to be WIDTH-1:0 not Y_WIDTH-1:0 | 15:24 |
daveshah | same for BB | 15:24 |
ZirconiumX | Woops, thank you | 15:25 |
*** emeb_mac has joined #yosys | 15:26 | |
*** rohitksingh has quit IRC | 15:31 | |
*** vonnieda has joined #yosys | 15:32 | |
*** emeb_mac has quit IRC | 15:34 | |
ZirconiumX | daveshah: While reading through the synth_ice40 pass, I noticed that the iCE40 uses DFFE cells. Is the E here an enable or something? | 15:39 |
daveshah | Yes | 15:39 |
daveshah | Clock enable | 15:39 |
*** rohitksingh has joined #yosys | 15:43 | |
ZirconiumX | daveshah: So from some research a DFFE is essentially a transparent latch? | 15:48 |
daveshah | ZirconiumX: No, that's a D latch | 15:48 |
daveshah | A DFFE is effectively a flipflop with an AND gate on the clock (or a mux in front of the data input) | 15:48 |
daveshah | *a D flipflop | 15:48 |
tnt | "an AND gate on the clock" ... well ... don't implement it like that :p | 15:50 |
ZirconiumX | I'm confused, then | 15:51 |
tnt | If clock-enable is low, a dffe will ignore rising edges. | 15:51 |
ZirconiumX | https://cdn.eeweb.com/articles/quizzes/dff-1293487103_180201_061807.png <-- one of these? | 15:51 |
tnt | yes | 15:51 |
tnt | 74AC377 | 15:52 |
tnt | Octal D-Type Flip-Flop with Clock Enable | 15:52 |
*** rohitksingh has quit IRC | 15:52 | |
ZirconiumX | Sadly that particular part was not made in the AC family | 15:53 |
tnt | http://www.mouser.com/ds/2/149/74ac377-288934.pdf ? | 15:54 |
ZirconiumX | Oh, seems TI are lying then :P | 15:55 |
tnt | Well ... not all manufacturer make all parts in each family ... | 15:55 |
tnt | each has its unique set of gates they make in each family. | 15:55 |
ZirconiumX | True, I suppose | 15:56 |
*** rohitksingh has joined #yosys | 15:56 | |
ZirconiumX | Hmmm | 16:01 |
ZirconiumX | What advantage does a DFFE have over a plain DFF? | 16:02 |
ZirconiumX | Probably more flexibility, at least | 16:02 |
daveshah | It saves a mux, in situations when you only update the DFF sometimes | 16:02 |
daveshah | any HDL of the form if (a) q <= d maps to a DFFE nicely | 16:02 |
*** flammit_ has joined #yosys | 16:03 | |
ZirconiumX | Ah, I see | 16:03 |
*** citypw has quit IRC | 16:04 | |
*** nengel has joined #yosys | 16:07 | |
*** Wolf481pl has joined #yosys | 16:10 | |
*** rohitksingh has quit IRC | 16:11 | |
*** flammit has quit IRC | 16:11 | |
*** ZipCPU has quit IRC | 16:11 | |
*** Wolf480pl has quit IRC | 16:11 | |
*** attie has quit IRC | 16:11 | |
*** flammit_ is now known as flammit | 16:11 | |
*** ZipCPU has joined #yosys | 16:14 | |
*** rrika has quit IRC | 16:14 | |
*** rrika has joined #yosys | 16:14 | |
ZirconiumX | daveshah: It looks like dfflibmap can't match/create DFFE cells. Is this correct, or am I just blind? | 16:19 |
daveshah | Yeah, looked like no-one has ever implemented this | 16:20 |
daveshah | It might be that DFFEs are more common in an FPGA context, which doesn't use dfflibmap | 16:20 |
ZirconiumX | Presumably then I should use techmap for this instead? | 16:21 |
daveshah | Yes | 16:21 |
ZirconiumX | DFF vs DFFE in 74-series logic is going to be an interesting tradeoff that might not pay off; you're saving muxes, sure, but the 16373 lets you fit twice as many DFFs in a chip. | 16:23 |
ZirconiumX | Well, assuming my math on this (broken) pass is correct, it *should* be a fairly major gain | 17:08 |
ZirconiumX | https://github.com/ZirconiumX/74xx-liberty/commit/4fa6b83d | 17:09 |
tpb | Title: (broken) DFF to 74AC377 DFFE pass · ZirconiumX/74xx-liberty@4fa6b83 · GitHub (at github.com) | 17:09 |
ZirconiumX | This leaks $_DFFE_PP_ cells, though | 17:10 |
*** citypw has joined #yosys | 17:10 | |
*** citypw has quit IRC | 17:19 | |
*** proteusguy has joined #yosys | 17:28 | |
daveshah | ZirconiumX: you need a general techmap call (`-map +/techmap.v`) before you try to map the $_DFFE_PP_ | 17:29 |
ZirconiumX | daveshah: Ah, thank you | 17:31 |
ZirconiumX | Before: 7729 | 17:32 |
ZirconiumX | After: 6734 | 17:32 |
ZirconiumX | That's pretty huge | 17:32 |
daveshah | I would expect to see a significant drop in the number of MUX2s? | 17:32 |
ZirconiumX | Indeed, we go from 1,316 to 876 | 17:33 |
tnt | wiring 6700 chips is still going to be fun :p | 17:37 |
ZirconiumX | This is for the whole benchmark | 17:38 |
ZirconiumX | Biggest winner is axilxbar, with about 25% less gates | 17:40 |
ZirconiumX | PicoRV32 is currently at 1,532 gates | 17:41 |
*** rohitksingh has joined #yosys | 17:42 | |
ZirconiumX | daveshah: Actually, I just had a thought. Yosys would expect each individual DFFE to have its own enable bit, but the 74AC377 has a single enable bit for 8 flops | 17:45 |
ZirconiumX | So this would be technically incorrect, right? | 17:45 |
ZirconiumX | Or at least, modelled incorrectly | 17:45 |
daveshah | ZirconiumX: the iCE40 flipflops are similar (as are most FPGAs) | 17:48 |
daveshah | have a look at how tnt implemented dffe_min_ce_use in synth_ice40 | 17:48 |
ZirconiumX | Ah, thank you, daveshah | 17:54 |
ZirconiumX | It's still an improvement, but very much less so | 17:54 |
ZirconiumX | At 7563 chips, currently | 17:55 |
ZirconiumX | Adding an opt_merge before unmapping like synth_ice40 does helped bring that down to 7378 | 18:11 |
*** maikmerten has joined #yosys | 18:12 | |
*** rohitksingh has quit IRC | 18:57 | |
ZirconiumX | I'm reading the "memory_bram" documentation (as Clifford suggested); what is a transparent read? | 18:58 |
ZirconiumX | (of SRAM) | 18:58 |
daveshah | A transparent read is where the read port will reflect writes in the current clock cycle (aka read after write) | 18:58 |
ZirconiumX | So if you write X to address Y on one port and simultaneously command a read from address Y, SRAM is transparent is you get X out? | 19:00 |
ZirconiumX | *if | 19:00 |
daveshah | Yes | 19:00 |
daveshah | Yosys can fake it with a mux if the SRAM isn't capable natively | 19:01 |
ZirconiumX | The SRAM I'm looking at at the moment appears to stall the read if you do that | 19:01 |
ZirconiumX | Is that transparent? | 19:01 |
daveshah | That sounds like not transparent, ie read before write | 19:01 |
ZirconiumX | https://www.idt.com/document/dst/713242-datasheet | 19:02 |
ZirconiumX | My plan with this is to designate one port as write and one as read | 19:03 |
daveshah | I'm not sure if this really fits one way or another | 19:03 |
daveshah | Yosys doesn't have a concept of BRAM stalling - this wouldn't map from Verilog well either | 19:04 |
ZirconiumX | So this chip wouldn't work? | 19:04 |
*** rohitksingh has joined #yosys | 19:04 | |
daveshah | You'd probably need to be a bit clever with how you drove it | 19:05 |
daveshah | Read on one clock cycle and write on the other or something, so you didn't have the collision | 19:05 |
ZirconiumX | Yeah, it'd need some anti-collision circuitry | 19:07 |
ZirconiumX | Or even properties to verify collisions could not happen | 19:07 |
*** dys has quit IRC | 19:14 | |
*** dys has joined #yosys | 19:16 | |
ZirconiumX | So, I managed to coerce memory_bram into working by fooling it into thinking the write port is clocked | 19:32 |
ZirconiumX | 1,009 chips, even though the RAM chips are very underused | 19:33 |
*** jevinskie has joined #yosys | 19:40 | |
ZirconiumX | 6,140 ICs | 19:43 |
ZirconiumX | Oh, hey jevinskie | 19:44 |
jevinskie | Howdy! | 19:45 |
*** m4ssi has joined #yosys | 20:12 | |
*** m4ssi has quit IRC | 20:22 | |
bwidawsk | daveshah› what is a SLICE in this context after PNR? | 20:32 |
daveshah | bwidawsk: a unit of two LUT4s, two flipflops, two MUX2s and two bits of carry logic | 20:33 |
bwidawsk | ah, this is like an ALM in altera parlance | 20:34 |
bwidawsk | thanks | 20:34 |
daveshah | Yup | 20:34 |
*** maikmerten has quit IRC | 20:37 | |
bwidawsk | daveshah› interestingly, synthesis time alone is a decent amount faster on blinky with gcc over clang | 20:51 |
bwidawsk | roughly 25% faster (granted we're talking ~2s here) | 20:52 |
daveshah | Interesting | 20:56 |
daveshah | Would be good to know if that applies to bigger benchmarks too | 20:57 |
bwidawsk | daveshah› I'll try to provide that info after I figure out how to get the same data from diamond on blinky | 21:04 |
*** Thorn has quit IRC | 21:10 | |
*** Thorn has joined #yosys | 21:11 | |
bwidawsk | daveshah› actually, I had it backwards - clang is faster, and it's more like 10% | 21:29 |
bwidawsk | it does fluctuate a bit... | 21:29 |
*** SpaceCoaster has quit IRC | 21:40 | |
bwidawsk | not sure if I did something wrong, but diamond and yosys use the same number of luts, but diamond uss half the number of slices | 21:47 |
bwidawsk | daveshah› https://0x0.st/zedC.txt | 22:00 |
daveshah | This is probably because nextpnr's packing density is pretty poor | 22:01 |
daveshah | This isn't counting carries (CCU2Cs) which take up a slice too | 22:02 |
bwidawsk | daveshah› does it make sense to add that? | 22:03 |
daveshah | Yes | 22:03 |
bwidawsk | daveshah› in nextpnr side, it's already counted by TRELLIS_SLICE, correct? | 22:05 |
daveshah | Yes | 22:05 |
bwidawsk | that brings it up to 184 vs. 232 then | 22:05 |
* bwidawsk needs to add a cost column :P | 22:06 | |
daveshah | I was referring to the synthesis side, in terms of LUT usage | 22:06 |
daveshah | Diamond SLICEs should already include CCU2s too | 22:06 |
bwidawsk | I have this | 22:07 |
bwidawsk | Number of SLICEs: 117 out of 41820 (0%) | 22:07 |
bwidawsk | SLICEs as Logic/ROM: 117 out of 41820 (0%) | 22:07 |
bwidawsk | SLICEs as RAM: 0 out of 31365 (0%) | 22:07 |
bwidawsk | SLICEs as Carry: 67 out of 41820 (0%) | 22:07 |
bwidawsk | Number of LUT4s: 233 out of 83640 (0%) | 22:07 |
bwidawsk | Number used as logic LUTs: 99 | 22:07 |
bwidawsk | Number used as distributed RAM: 0 | 22:07 |
bwidawsk | Number used as ripple logic: 134 | 22:07 |
bwidawsk | Number used as shift registers: 0 | 22:07 |
daveshah | So that is equivalent to 117 TRELLIS_SLICE | 22:07 |
daveshah | I'm curious what the Yosys output is | 22:08 |
bwidawsk | daveshah› https://0x0.st/zedC.txt | 22:08 |
bwidawsk | oops | 22:08 |
bwidawsk | daveshah› https://0x0.st/zenc.txt | 22:09 |
daveshah | So the total number of LUTs Yosys has inferred is 233 + 74*2 | 22:10 |
daveshah | Yosys doesn't include the two LUT4s in the CCU2C in its statistic, whereas I believe Diamond does | 22:11 |
bwidawsk | so quite a bit worse then, huh? | 22:11 |
bwidawsk | let me post the entirety of the map output | 22:12 |
bwidawsk | daveshah› https://0x0.st/zenm.txt | 22:12 |
daveshah | Yeah, Yosys has some serious area issues for ECP5 at the moment | 22:13 |
daveshah | Mostly because the lack of proper LUT timings in ABC make it much too eager use muxes to build large LUTs | 22:14 |
bwidawsk | if I'm trying to paint open tools in a good light, should I do ice40 then? | 22:15 |
daveshah | I expect you'll find it much the same | 22:15 |
bwidawsk | :/ | 22:15 |
daveshah | Probably a bit better, but we still definitely lag behind | 22:16 |
daveshah | Things will pick up once this PR is merged https://github.com/YosysHQ/yosys/pull/1098 | 22:16 |
tpb | Title: WIP "abc9" pass for timing-aware techmapping (experimental, FPGA only, no FFs) by eddiehung · Pull Request #1098 · YosysHQ/yosys · GitHub (at github.com) | 22:16 |
daveshah | But it might be a month or two | 22:16 |
bwidawsk | for my sake, would it make sense to just merge it and try that out? | 22:17 |
bwidawsk | I don't care if it's in master so long as I can say in good faith, it will be | 22:17 |
daveshah | It's probably not giving a massive improvement yet, there is still some work that isn't even pushed | 22:18 |
daveshah | If you do try it, you'll need to add -abc9 to synth_ecp5 and synth_ice40 | 22:18 |
bwidawsk | daveshah› actually, if you read the notes from the diamond log, it seems like they misreport the total number of luts | 22:18 |
bwidawsk | Notes:- | 22:18 |
bwidawsk | 1. Total number of LUT4s = (Number of logic LUT4s) + 2*(Number of | 22:18 |
bwidawsk | distributed RAMs) + 2*(Number of ripple logic) | 22:18 |
bwidawsk | it's a bit confusing | 22:19 |
daveshah | That is equivalent to in Yosys doing LUT4 count + 2*CCU2 count | 22:19 |
daveshah | If you want to see Yosys doing better in area, you can try adding -nomux to synth_ecp4 | 22:20 |
daveshah | *synth_ecp5 | 22:20 |
bwidawsk | for diamond then, i should be doing 99 + 2 * 134, correct? | 22:20 |
daveshah | No the Diamond number of LUT4s is correct | 22:21 |
daveshah | Yosys has done badly and there's no escaping | 22:21 |
bwidawsk | and presumably, Fmax goes up if you fix those somewhat | 22:22 |
*** AlexDaniel has quit IRC | 22:23 | |
daveshah | Yes | 22:24 |
*** fsasm has quit IRC | 22:24 | |
daveshah | Do you have the design somewhere? | 22:25 |
bwidawsk | it's your blinky from prjtrellis | 22:25 |
daveshah | bwidawsk: I'll push a PR tomorrow, turns out the subtractor mapping in Yosys was very suboptimal and hurting that design particularly badly | 22:41 |
daveshah | Should be down to more like 284 LUT4s in Diamond terms | 22:42 |
bwidawsk | daveshah› thanks! | 22:42 |
bwidawsk | if you add me to the cc, I would be happy to test it | 22:42 |
*** vonnieda has quit IRC | 22:50 | |
*** jevinskie has quit IRC | 22:59 | |
*** jevinskie has joined #yosys | 22:59 | |
bwidawsk | sadly the soc_ecp5_evn project doesn't just work in diamond | 23:14 |
*** rohitksingh has quit IRC | 23:25 | |
bwidawsk | looks like it doesn't like how EHXPLLL is instantiated | 23:41 |
bwidawsk | it all looks right afaict... | 23:53 |
Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!