Saturday, 2019-08-10

*** tpb has joined #yosys00:00
*** pie_ has quit IRC02:42
*** pie_ has joined #yosys02:46
*** pie_ has quit IRC02:48
*** PyroPeter has quit IRC03:12
*** PyroPeter has joined #yosys03:25
*** citypw has joined #yosys03:32
*** Thorn has joined #yosys04:42
*** _whitelogger has quit IRC06:03
*** _whitelogger has joined #yosys06:05
*** _whitelogger has quit IRC06:21
*** _whitelogger has joined #yosys06:23
*** rohitksingh has joined #yosys06:41
*** emeb_mac has quit IRC07:29
*** Jybz has joined #yosys07:29
*** dys has joined #yosys08:07
*** s_frit has quit IRC08:41
*** s_frit has joined #yosys08:41
*** rohitksingh has quit IRC08:50
*** _whitelogger has quit IRC08:57
*** AlexDaniel has quit IRC08:58
*** _whitelogger has joined #yosys08:59
*** kerel has left #yosys09:05
*** rohitksingh has joined #yosys09:09
*** rohitksingh has quit IRC09:19
*** rohitksingh has joined #yosys09:59
*** rohitksingh has quit IRC10:24
*** rohitksingh has joined #yosys11:05
*** rohitksingh has quit IRC11:37
*** Thorn has quit IRC11:57
*** AlexDaniel has joined #yosys12:04
*** janrinze has joined #yosys12:25
janrinzehi. after the relut issues performance was restored but now the latest master seems to have significant perfomance reduction again. Apparently relut has been removed and a new strategy has replaced it. Anyone else seeing this too?12:27
janrinzedoing a rebuild of yosys now to see if it was cause by a build issue. (sometimes changes don't propagate fully and a rebuild is required after git pull.)12:29
tntrelut was not removed. the bug was just fixed. (two things were merged at once and they conflicted to create a perf issue)12:30
*** pepijndevos[m] has joined #yosys12:31
*** rrika has quit IRC12:32
*** rohitksingh has joined #yosys12:33
*** rrika has joined #yosys12:35
*** X-Scale has quit IRC12:38
*** rohitksingh has quit IRC12:43
janrinzetnt: did you see commit d5e8c0e6d33de71493855eca72fcc454a67a6140 ?12:47
*** rohitksingh has joined #yosys12:51
tntArf, no I had.12:54
tntnot12:55
janrinzetnt: care to try out the latest master? I'm curious if you see similar results13:04
tntyeah, already building it ...13:05
tntbut my laptop is like 5y old, it takes a while :p13:05
janrinzeyeah, i can relate.13:06
janrinzetnt: i have a simple design that ran 100Mhz and now can only reach 68Mhz. that's quite a set back.13:10
tntI can top that.13:11
tntERROR: timing analysis failed due to presence of combinatorial loops, incomplete specification of timing ports, etc.13:11
tntIt went from working to ... not working.13:11
tntso yeah, I'd say yosys master is broken ATM13:13
janrinzecommit ea8ac8fd7484cc7c3b8929ae339f9aeb49403c36 too13:14
janrinzeoh, the presence of combinatorial loops  messages have been bugging me too recently.13:14
janrinzeThought it was my design but clearly it's not just me :-)13:15
janrinze70% speed is quite annoying.13:15
janrinzeMy cpu went back form 52Mhz to 34.5 !13:16
janrinzein all quite a bit of regression i.r.t. speed13:17
janrinzeIf it has been replaced then I wonder if there is a new flag to be used for optimization.13:19
janrinzetnt: with a build of the last commit on aug 7 everything is okay?13:23
tntSo I found a way to make it pass ... but the design is now 800 LCs larger (2700 -> 3500) and ~ 10% slower.13:31
tntwhich commit do you want me to test ?13:31
janrinzecommit f69410daaf68cd3cef5e365df9b27c623ce589a7 should be the last one of aug 713:34
janrinzetnt: i'm building that one now too and hopefully will see what the differences are.13:36
*** pepijndevos[m] has left #yosys13:38
*** pepijndevos[m] has joined #yosys13:38
*** AlexDaniel has quit IRC13:50
tntjanrinze: seems to work better ... fmax is a bit lower but that's probably just randomness ..14:02
tntLCs at least are back to "normal" (just 3 LCs more)14:02
tntnextpnr still crashes at the end with "terminate called after throwing an instance of 'std::out_of_range'"14:03
tntmaybe there was some incompatible change between yosys / nextpnr ...14:03
*** emeb has joined #yosys14:03
daveshahThere was a breaking change in the JSON, new  nextpnr with old JSON (where there is a problem, mostly ecp5) will give a sensible error but it wasn't possible to do anything with new JSON and old nextpnr14:07
daveshahAlthough annoying it finally means we have unambiguous parameters in all cases14:07
tntRebuilding nextpnr now. But yeah, looks like f69410daaf68cd3cef5e365df9b27c623ce589a7 is fine. But master is not.14:10
*** develonepi3 has joined #yosys14:29
janrinzetnt: just tested commit f69410daaf68cd3cef5e365df9b27c623ce589a7 with my simple design and it gives 100 Mhz again.15:11
tntac2fc3a144fe1094bedcc6b3fda8a498ad43ae76 is what screws it up for me.15:13
*** X-Scale has joined #yosys15:27
tntjanrinze: feel free to open an issue on github15:29
*** emeb has quit IRC15:30
janrinzetnt: looking for an example that is small enough to show in an issue.15:30
*** emeb_mac has joined #yosys15:33
*** Thorn has joined #yosys15:37
tntgiven the merge branch name that screws it up is ice40_full_adder, I'd say anything with an adder would be bad :p15:45
*** rohitksingh has quit IRC16:04
janrinzetnt: the unlut part was intended to undo lut allocation by abc and allow further optimization of carry and lut. It seems that abc9 now supports full adder and will emit those. Unfortunately it seems it's not very smart about it.16:06
daveshahI don't think abc9 is emitting full adders, just perhaps optimising around them16:09
daveshahI'm not even sure what that PR was about, I suspect it might be best to revert it16:10
tntIt's https://github.com/YosysHQ/yosys/pull/1266  but ... doesn't explain much of the "why"16:11
tpbTitle: Wrap SB_LUT+SB_CARRY into $__ICE40_CARRY_WRAPPER by eddiehung · Pull Request #1266 · YosysHQ/yosys · GitHub (at github.com)16:11
daveshahWith the somewhat intricate carry structure in the iCE40, it's easy to trigger edge cases that result in ridiculous amounts of feed throughs being generated in an attempt to legalise them16:12
*** rohitksingh has joined #yosys16:21
daveshahFYI https://github.com/YosysHQ/yosys/pull/128016:26
tpbTitle: Revert "Wrap SB_LUT+SB_CARRY into $__ICE40_CARRY_WRAPPER" by daveshah1 · Pull Request #1280 · YosysHQ/yosys · GitHub (at github.com)16:26
*** citypw has quit IRC16:39
pepijndevosWhat's the most gate-efficient way to write synchronous logic experssions? I currently have a big ball of nested if and case statements.16:46
*** emeb_mac has quit IRC16:46
tntpepijndevos: my best results have been either describe it as close as the exact logic I want. (i.e. do the synthesis in my head and describe that). Or describe it as bare logic equations ( OR of ANDs ) that I externally minimized.16:56
*** rohitksingh has quit IRC17:05
*** AlexDaniel has joined #yosys17:46
janrinzedaveshah: i'm building yosys from branch revert-1266-eddie/ice40_full_adder now to see if the performance regression is fixed with that too.17:53
*** rohitksingh has joined #yosys17:53
pepijndevostnt, so far the pieces of logic I managed to extract as asynchronous assignments were definitely smaller than what I had.18:14
tntpepijndevos: I'm not surprised :)18:15
tntpepijndevos: is the code you're working on somewhere public btw ?18:16
pepijndevosSo basically I want to reach a situation where my sequential process is JUST unconditional assignments.18:16
pepijndevostnt, https://github.com/pepijndevos/seqpu/blob/master/cpu.vhd18:17
tpbTitle: seqpu/cpu.vhd at master · pepijndevos/seqpu · GitHub (at github.com)18:17
tntyou want to minimize them. and especially you want to minimize the dependencies on signals that don't matter.18:26
tntwith nested if and conditions it's easy to hardcode a 'priority' or force a signal in a state in some case where in fact in that case the actual value doesn't matter.18:28
pepijndevosRight18:28
pepijndevosWhat do you mean by "minimize" though?18:28
pepijndevosAnd also... for example the next state of b is quite complicated, so it's not trivial to turn all of them into an async assignment18:29
tntI mean do as best you can :)18:35
tntSomething that can help as well (especially if you're shooting for area), is to extract the wide muxes and only genrate the control signal in the large switch.18:37
tntFor instance if B can only take 4 different values, but which one is complex, you manually create the mux and you only generate the 'selection' signal in the large case.18:38
pepijndevosAh I see18:45
pepijndevosMaking progress...18:46
pepijndevosIf someone just made a "sufficiently smart" compiler...18:48
janrinzepepijndevos: do you use vhdl with yosys?18:57
tntjanrinze: did it work ?19:01
tntjanrinze: (and the commercial version of yosys has a vhdl frontend)19:02
janrinzedaveshah: f.w.i.w. the branch revert-1266-eddie/ice40_full_adder produces 100 MHz results again for my design.19:02
pepijndevosjanrinze, YES! I'm using GHDL for everything except formal at the moment.19:03
pepijndevosFormal verification with GHDL is the next thing on my todo list :)19:03
janrinzepepijndevos: GHDL? I should take a look at that. Does it translate to verilog in the backend?19:04
*** s_frit has quit IRC19:04
*** rohitksingh has quit IRC19:04
*** s_frit has joined #yosys19:05
tntpepijndevos: oh interesting, I thought you were using verific.19:05
janrinzepepijndevos: GHDL seems to be only a simulator. How do you use that with yosys?19:05
daveshahjanrinze: if your design is public, could you add a link to it and a comment about the Fmax details on the revert PR?19:05
janrinzedaveshah: it's a tta cpu, I'm still working on it. No real stuff to test or verify yet.19:08
daveshahNo worries19:08
janrinzedaveshah: TTA is very attractive for small fpga's its a 16 bit cpu with 8 KB program and 8KB data memory. and a 32 bit ALU.19:09
janrinzedaveshah: the core of the cpu is one instruction : mov Rd,Ra19:10
janrinzedaveshah: Rd and Ra are 8 bit references to a register in any of the 16 module slots.19:11
janrinzedaveshah: so each module has 16 write and 16 read registers.19:11
*** rohitksingh has joined #yosys19:12
janrinzedaveshah: It's an experiment in TTA cpus that i wanted to try. With the HX8K is can run at 100 MHz!19:12
tntjanrinze: did you try an up5k ? (just to see fmax)19:14
janrinzetnt: my 16bit RISC runs on both hx8k and up5k. the up5k being about 50% the speed of the hx8k19:16
janrinzetnt: I'll check the timings for up5k19:16
tntHow's the LC count ?19:17
janrinzetnt: for the TTA or the RISC16?19:19
tntBoth I guess :p19:20
janrinzetnt: the TTA is ICESTORM_LC  1693/ 7680    22% , using 1 16x16 registerfile , 1 load/store module and 1 ALU (32 bit).19:28
janrinzetnt: on the up5k that is 1693/5280 or 32%19:29
janrinzetnt: up5k has 128KB SPRAM but only 15 KB Block RAM.19:31
tntyeah I know :)19:32
*** X-Scale has quit IRC19:35
*** rohitksingh has quit IRC19:40
janrinzetnt: the up5k has a disappointing 40MHz result for the TTA.19:56
tntok.19:57
janrinzetnt: on the icoboard I have not been able to get sram to work reliably above 35 MHz. Something to do with the pin delays i think.19:58
daveshah40MHz is a pretty good going for a up5k, tbh19:58
janrinzedaveshah: yes, i think it's about the limit for the up5k with anything practical20:00
tntjanrinze: as soon as you go outside bidirectionally you need to take all the IO delays into account and AFAIK next-pnr doesn't provide output clock-to-out or input setup/hold numbers.20:00
janrinzetnt: the RISC16 SoC takes ICESTORM_LC 3776/ 7680    49%20:01
tntThat's a bit large for my taste :p20:01
janrinzetnt: unfortunately there is no way to tell the toolchain that the address to data delay is 10ns20:01
janrinzetnt: that SoC has VGA 512x384 2bit , HW MUL, HW DIV, SPI access to SDcard etc..20:02
janrinzetnt: in only 49% of the HX8K I think that is a respectable 'small' size.20:03
tntAh oki that's not just the cpu.20:03
janrinzetnt: it's about the same as a home computer from the 80's20:04
tntalso hw mul/div ... I guess you're not using the dsp ?20:04
janrinzeOn the up5k I do use the DSP. Not on the HX8K20:04
tntobviously :p20:04
janrinzedaveshah: will we see some tooling for defining I/O delay in the near future?20:06
daveshahProbably not, it's nowhere near the top of the todo list20:06
tntjanrinze: you can't really _define_ IO delays btw, the tool would just report them mostly.20:06
tntIt's up to you to actually 'by design' make it so it works for you.20:07
daveshahI'm not even sure if we have enough data in icebox, tbh20:07
daveshahWe have some slightly questionable numbers for the IO primitives but ime the vendor tools use numbers depending on voltage and load capacitance20:07
tntfirst thing is to always use IO registers for in/out, this way those delays are constant.20:07
janrinzetnt: the SoC can run at 58Mhz if no external RAM is used. Unfortunately 16KB is way too small for a SOC with a VGA framebuffer.20:07
tntjanrinze: not sure if you've seen but : https://github.com/smunaut/ice40-playground/blob/master/cores/video/doc/text-mode.md20:08
tpbTitle: ice40-playground/text-mode.md at master · smunaut/ice40-playground · GitHub (at github.com)20:08
daveshahYeah as tnt says I don't think the RAM issue is one the tool would be able to fix itself20:09
tntThat's a "text mode" (with user definable glyphs) core for the up5k using the SPRAMs.20:09
daveshahBut better timing analysis would at least report issues20:09
janrinzetnt: text mode would work, I guess. Still the code from the ROM is already 16KB..20:09
janrinzeI'll see if i can hack up a rom that is 8KB and put the rest (vga txt, stack etc.) in the other 8k.20:10
tntjanrinze: use one spram for the code :) (that's what I do. I have a minimal boot code in ROM that loads the rest from SPI to the spram)20:10
daveshahCould you add wait states to run the external RAM slower than the CPU core?20:11
janrinzetnt: on the up5k it runs at 22MHz and has 128KB SPRAM available. no external chips20:11
daveshahAnd then put speed critical stuff available in internal ram20:11
tntjanrinze: ah oki.20:11
janrinzedaveshah: waitstates could work but i think cache has a better chance of getting higher speeds to work.20:12
pepijndevosjanrinze, look at ghdlsynth-beta20:12
janrinzepepijndevos; yes already found that but no time yet to go through it.20:13
pepijndevosIt's extremely beta20:15
janrinzeI noticed it said that in the README too :-D20:15
pepijndevoshrm... I rewrote all of my code as async with just unconditional assignments, and it turns out that yosys is smarter than me on these last bits.20:16
janrinzepepijndevos: there is a lot of vhdl around that would be interesting to test with yosys.20:16
pepijndevosCertainly20:16
pepijndevosI know a few people who'd like to run popual retro computing simulators on yosys :)))20:17
pepijndevos*popular20:17
janrinzepepijndevos: I tried to write 'efficient' verilog in the past but noticed that yosys usually is very smart to compile high level verilog. So usually no need to worry about the code.20:18
pepijndevosYea... usually, but a few bits that I moved outside of the process saved a lot.20:19
janrinzepepijndevos: I've been doing retro computing for 15 years now. Specifically 6502 based and ARM320:19
janrinzepepijndevos: I designed a few CPUs to do some alternative computing platforms. Somthing between a Apple II GS and an Acorn Archimedes.20:20
janrinzetnt: any experience with ecp5?20:21
janrinzetnt: The amount of BRAM seems very adequate for my SoC20:22
tntjanrinze: nope, not yet ... I plan to get acquainted with it during CCC camp in a couple of weeks.20:22
tntdaveshah: will you be at camp btw ?20:23
* pepijndevos jealous20:23
daveshahNo, I'm not20:23
pepijndevosI completely missed out on camp ticket sale20:23
janrinzedaveshah: any quick pointers for using nextpnr-ecp5 instead of nextpnr-ice40 ? I'd like to get a feel for what the ecp5 can do20:42
daveshahjanrinze: this is a minimal example for the  versa: https://github.com/SymbiFlow/prjtrellis/tree/master/examples/versa5g20:44
tpbTitle: prjtrellis/examples/versa5g at master · SymbiFlow/prjtrellis · GitHub (at github.com)20:44
daveshahThis is a small picorv32-based SoC: https://github.com/SymbiFlow/prjtrellis/tree/master/examples/soc_versa5g20:44
tpbTitle: prjtrellis/examples/soc_versa5g at master · SymbiFlow/prjtrellis · GitHub (at github.com)20:44
janrinzedaveshah: thanks. I just noticed i need to recompile nextpnr-ecp5 too.20:49
janrinzebuilding nextpnr-ecp5 is for the people who are patient and willing to close all other apps to ensure it does not run out of memory :-D21:13
*** emeb_mac has joined #yosys22:03
*** Jybz has quit IRC22:43
*** dys has quit IRC23:35

Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!