Friday, 2019-02-15

*** tpb has joined #yosys00:00
corecodedaveshah: sorry to keep asking you00:51
corecodehow do i find the extra_bits_db (padin_glb_netwk)00:51
corecodehm, is it as easy as reading this file?00:53
corecodeyea i see numbers that are very similar, but one off, like for the 1k00:55
*** seldridge has quit IRC01:28
*** emeb has left #yosys01:38
corecodeheh, icecube says Internal Error: Assumption 'arch->IsPlaceable(y, x)' failed in plGraphConverter.cpp line 135901:38
corecodewhen placing a DFF at tile (7, 20)01:38
*** emeb_mac has joined #yosys01:42
*** citypw has joined #yosys02:02
*** gsi__ has joined #yosys02:12
*** gsi_ has quit IRC02:15
*** leviathanch has joined #yosys02:40
*** seldridge has joined #yosys03:16
*** rohitksingh has joined #yosys03:58
*** seldridge has quit IRC04:06
promachZipCPU: see https://i.stack.imgur.com/BQd2X.png04:22
*** pie___ has joined #yosys04:29
*** rohitksingh has quit IRC04:31
*** pie__ has quit IRC04:32
*** jevinskie has joined #yosys04:39
*** rohitksingh_work has joined #yosys04:41
*** promach has quit IRC04:51
*** promach has joined #yosys05:22
*** m_w has quit IRC05:50
*** develonepi3 has quit IRC07:03
*** emeb_mac has quit IRC07:13
*** rohitksingh_work has quit IRC07:55
*** rohitksingh_work has joined #yosys07:58
*** dys has quit IRC08:06
*** awordnot has quit IRC08:17
*** awordnot has joined #yosys08:19
*** leviathanch has quit IRC08:42
promachZipCPU: see the updated version https://i.imgur.com/oPAVWdh.png08:44
promachwait, I just found another corner case :|09:04
promachZipCPU: I have just updated https://gist.github.com/promach/5f2d9a9494704ed93cf65687c982198c#file-multiply-v09:19
tpbTitle: A signed multiply verilog code using row adder tree multiplier and modified baugh-wooley algorithm · GitHub (at gist.github.com)09:19
daveshahcorecode: yeah, just try a SB_GB_IO (or internal oscillator, probably for two of the cases) at each global input location09:19
daveshahand see which extra_bit appears in the asc file09:19
*** rohitksingh_work has quit IRC09:49
*** rohitksingh_work has joined #yosys09:52
*** citypw has quit IRC09:59
sxpertpromach: image is gone10:12
promachsxpert: see the gist, wait let me update the image10:14
sxpertah ok10:15
*** rohitksingh_work has quit IRC10:16
promachthere is still bug when A_WIDTH != B_WIDTH10:24
sxpertI see10:27
promachsxpert: ok, it works now10:35
sxpertthe url probably changed10:37
promachwait, let me update the code. give me 15 minutes10:37
tntI'm wondering if there are algorightm that specifically target LUT4 arch so that each layer uses the 4 inputs of the lut and not just 2.10:40
tntlike doing the partial multiply 2 bits at a time instead of 1 bit at a time.10:41
*** leviathanch has joined #yosys10:44
promachsxpert : https://gist.github.com/promach/5f2d9a9494704ed93cf65687c982198c#file-multiply-v10:47
tpbTitle: A signed multiply verilog code using row adder tree multiplier and modified baugh-wooley algorithm · GitHub (at gist.github.com)10:47
promachtry it out and see what whether this meets your needs10:48
promachand probably find out some corner cases that my assert() is not capable of finding10:48
promachsxpert : just try it out first10:48
promachprobably I will need to solve the induction bugs before knowing which assert() I had missed10:51
sxpertas ZipCPU would say, have you tried formal methods ?11:04
*** rohitksingh_work has joined #yosys11:04
promachsxpert : induction is part of yosys-smtbmc11:10
promachand yosys-smtbmc is formal tool11:11
promachand induction can help find bugs within the actual verilog source code itself as well as the formal code11:24
promachsxpert : https://i.imgur.com/BLYZTi6.png should stay valid until induction shows me otherwise later11:31
*** s_frit has quit IRC11:32
*** s_frit has joined #yosys11:33
promachuse pencil and paper method to check this countermeasure11:33
promachI cannot be 100 percent sure about the correctness of this countermeasure since I had not done a rigorous maths proof about this11:34
promachand the code had not passed induction yet11:34
*** rohitksingh_work has quit IRC11:47
corecodeyea i don't know what the colbufs are, so i don't know how to look for them in the files11:55
*** s_frit has quit IRC12:10
*** s_frit has joined #yosys12:11
*** jevinskie has quit IRC12:15
tntcorecode: AFAIR they're buffer to distribute global networks in various parts of the chips and if some global isn't needed in some area, you can disable those buffer to save power.12:42
tntand I think nextpnr currently just globally enables them all indiscriminately if they're actually needed or not.12:42
*** s_frit has quit IRC13:19
*** s_frit has joined #yosys13:20
*** rohitksingh has joined #yosys13:26
*** jevinskie has joined #yosys14:23
*** citypw has joined #yosys14:41
*** seldridge has joined #yosys14:56
*** jevinskie has quit IRC15:26
*** maikmerten has joined #yosys16:36
*** jevinskie has joined #yosys16:38
*** develonepi3 has joined #yosys16:39
*** gsi__ is now known as gsi_17:02
*** rohitksingh has quit IRC17:04
*** gruetzkopf has quit IRC17:13
*** kerel has quit IRC17:13
*** gruetzkopf has joined #yosys17:14
develonepi3daveshah: I have seen your presentation several times and have enjoyed it very much.  You comments  that "Most FPGA Development use closed-source tools, FPGA vendors don;t document bitstreams." Are right on point.  Yourself & others ZipCPU, & Clifford Wolfe have advanced FPGA discipline of study more in the past few years than others in decades.  I think that we are now at cusp where more people will start using FPGAs.  I have been working in17:15
develonepi3Compressing Numerical Meteorological Modeled Data for many years.  This work Karhunen-Loeve transform (KLT) in the vertical direction and JPEG 2000 on XY slices has been abandoned.  I recently started working on Bare Metal for the Raspberry Pi3B+ using Ultibo.  I think this is now achievable with your ECP5 efforts and the Raspberry Pi3B+ running Bare-Metal.17:15
*** kerel has joined #yosys17:15
*** m4ssi has quit IRC17:35
*** jevinskie has quit IRC17:37
*** rohitksingh has joined #yosys17:40
corecodei doubt you'd see a performance improvement for compute between running linux or ultibo17:46
ZipCPUcorecode: I'm curious why you'd say that17:50
*** develonepi3 has quit IRC18:24
corecodeZipCPU: because typically compute means that there is no kernel executing for most of the time18:28
*** emeb has joined #yosys18:35
*** dys has joined #yosys18:39
*** develonepi3 has joined #yosys18:41
ZipCPU... and, go on18:43
corecodegiven that the kernel doesn't run much at all, it is unlikely that you see performance differences18:44
ZipCPUSorry, I guess I misread your response.  You meant between Linux and Ultibo, and I thought you meant (Linux and Ultibo) vs FPGA18:45
corecodeoh no18:45
daveshahFWIW, if this is floating point heavy then there's little chance of the ECP5 beating the Pi, you'd probably need something much fancier, unless you are very clever about how you describe it18:47
corecodehi daveshah18:47
daveshahotoh I can easily see the ECP5 winning if you get it fixedpoint/integer18:47
daveshahhi corecode!18:47
corecodeyou're just the guy i was looking for18:47
corecodei'm trying to button up this icestorm stuff18:47
ZipCPUdaveshah: When I last examined the algorithm, it was I/O (i.e. SDRAM) bound18:47
corecodewhat am i looking for in the colbuf_logic output?18:48
corecodebecause i got it running (except for one tile)18:48
corecodebut now i don't know what i am looking for18:48
daveshahYou should see 4-tuples (colbuf_x, colbuf_y, user_x, user_y)?18:48
daveshahHopefully colbuf_x and user_x are the same18:49
corecodeno, that must be a different script18:49
corecodeyou mean colbuf_io?18:49
corecodethere are 3 different colbuf scripts18:49
daveshahah, I think you might need to use colbuf.py to parse the colbuf_logic output18:49
daveshahie, pass all the .exp files created by colbuf_logic to colbuf.py18:50
corecodeaha18:50
corecodelast time i used it, i got assertion errors18:51
corecodebecause icebox is missing some data18:51
corecodethis is a big maze18:51
corecodewhat are those colbufs?18:51
daveshahbasically, the global network is split up into segments to save power18:52
daveshahthe colbufs are the buffers for a given line segment18:52
daveshahthere's an illustration at http://www.clifford.at/icestorm/io_tile.html18:52
tpbTitle: Project IceStorm IO Tile Documentation (at www.clifford.at)18:52
corecodeaaah! more documentation i didn't know about18:53
daveshahthe grey circles are the column buffers and the red lines indicate the tiles in which globals are driven by that buffer18:53
daveshahthere's a script to generate an svg like that somewhere too18:53
corecodeyea18:54
*** tannewt has quit IRC19:14
*** tannewt has joined #yosys19:16
*** rohitksingh has quit IRC19:30
corecodedaveshah: aha!  colbuf_io*.sh does not produce output with a ColBufCtrl line19:32
corecodedaveshah: so what's going on there19:32
daveshahcorecode: quite possible that the lm4k doesn't have IO colbufs (i.e. they are enabled all the time)19:33
corecodedoes this chip not have ColBufCtrl in IO cells?19:33
corecodeand what the hell is up with (7,20), why did the icecube placer throw an internal assert19:34
daveshahnot something I've ever seen19:34
daveshahperhaps that tile is broken19:34
corecodethat they noticed later?19:35
daveshahwithin the realms of possibility19:36
corecodeso from reading the code in icebox, it seems that the other dies have e.g. the bottom IO tile connected to the colbuf19:37
corecodebecause y==0 also maps to col buf source y=419:37
corecodebut somehow when running the colbuf_io script, i don't see any colbufctrl - what does that mean?19:38
corecodethe signal has to be routed somehow?19:38
corecodeor they always route it directly, and not with a colbuf?19:38
corecodewhy would this aspect be so different from the 5k19:39
daveshahmy guess is that it is always routed19:42
corecodewait, maybe the problem is a different one19:44
corecodeso if i understand the colbuf_io code right, it sets up an IO cell that uses a clk, therefore a global network19:45
daveshahyeah19:45
corecodethe input clock (from a different pin) will have to be routed via the global network19:45
corecodeor?19:45
corecodebecause what i'm seeing is that the clk signal is routed via standard routing19:46
daveshahah, that explains that one19:46
daveshahmodify the example code to put an SB_GB in between clock pin and SB_IO19:46
daveshaheg SB_GB gbuf (19:47
daveshah.USER_SIGNAL_TO_GLOBAL_BUFFER(clk_in),19:47
daveshah.GLOBAL_BUFFER_OUTPUT(clk)19:47
daveshah);19:47
corecodethe ram code does that19:48
corecodeinteresting that it worked for others?19:48
corecodei'm also using the newer icecube, so maybe that's a difference19:48
daveshahyes, quite possibly an icecube change19:49
corecodethanks19:49
corecodeknowing what to look for really helps :)19:49
corecodeso how do i get the numbers for the extra_bits?19:50
corecodethere are some comments i don't quite get19:50
corecodee.g.19:50
corecode        (1, 331, 143): ("padin_glb_netwk", "3"), # (1 3)  (331 144)  (331 144)  routing T_0_0.padin_3 <X> T_0_0.glb_netwk_319:50
corecode19:50
corecodewhere does the first tuple come from?19:50
daveshahThat comment is the text description of the bit from the GLB file19:51
daveshahthe first tuple comes from the .extra_bit in the asc or exp19:51
corecodeif it said (1, 331, 144) i'd understand it19:51
daveshahThere's an offset for some strange reason19:51
corecodefor some only?19:51
daveshahI can't remember the specifics19:51
corecodeok19:51
daveshahSo long as the first tuple comes from the asc/exp it should be fine19:52
corecodeso those i just need to place a global input and observe what extra bits are being set19:52
daveshahYes19:52
daveshahin some cases, it might be an oscillator rather than a global input19:52
corecodeyes19:52
corecodei guess i know what global network it is19:53
corecodeand gbufin locations i need as well, but i guess i should get that with the same test19:54
daveshahI think the datasheet should have those19:54
daveshahbeware the gbufin locations in the datasheet are for global input pins19:54
daveshahSB_GBs which drive from fabric are at the same locations, but drive a different network to the pin at that location19:55
corecodethat's also related to the padin_pio_db?19:55
daveshahYes, those are the input pin that drive each global19:56
corecodeit seems some dbs require a specific sequence, and i don't know what the sequence needs to match19:56
daveshahpadin_pio_db is in global network number order19:56
corecodei'm not sure which terminology is what19:57
daveshah"padin" refers to the dedicated route from a specific IO pin to a specific global network19:59
daveshah"gbufin" refers to the route from fabric (the fabout into an IO tile) into a global network20:00
*** m4ssi has joined #yosys20:09
*** m4ssi has quit IRC20:09
*** m4ssi has joined #yosys20:31
corecodeyea that was it, with the extra SB_GB the io tiles could be tested too20:44
*** m_w has joined #yosys20:50
*** maikmerten has quit IRC20:57
*** leviathanch has quit IRC21:00
*** m4ssi has quit IRC21:22
*** FL4SHK has joined #yosys22:02
*** indy has quit IRC22:06
*** indy has joined #yosys22:10
*** indy has quit IRC22:37
*** show1 has joined #yosys22:39
*** indy has joined #yosys22:43
corecodei'm surprised there is no automation for the global network stuff22:57
corecodeor i am missing it22:57
daveshahI don't think there is anything22:58
daveshahAs it is only 8 globals per device I don't think anyone bothered22:59
corecodeok22:59
corecodei guess i need to create a different footprint option to capture all information22:59
daveshahYes, quite possibly23:00
corecodeoh the pllauto script is fantastic23:06
daveshahI did that one when doing the UltraPlus23:07
daveshahIt doesn't do routing, but that is pretty quick to figure out with icebox_vlog (and only needs one design)23:07
corecodethe ultralite is very similar23:08
corecodei23:08
corecodei'm just trying to verify that the values are the same23:08
daveshahYeah, very sensible23:08
corecodei think the bit assignments are the same, pins/cells are different23:08
corecodepossibly the special different bits are different as well23:09
daveshahThe UltraPlus had a strange layout bitstream wise where one "half" was twice the height of the other23:10
corecodeand what do these extra bits do?23:11
daveshahThe 8 padin ones?23:11
corecodeno, the extra height ones23:14
daveshahOh, they are used to go from 3520 LUTs in the Ultra to 5280 in the UltraPlus23:15
corecodeah23:15
corecodeis the lm4k the ultra?23:15
daveshahNo, that's the iCE40LM which is older23:16
daveshahiCE5LP is Ultra23:16
corecodewhat device string is that in icebox?23:16
corecode5k is ultraplus23:17
daveshahlm4k is LM23:17
daveshahUltra isn't in icebox23:17
daveshahiceunpack uses u4k for Ultra23:18
corecodeah!23:19
corecodebut it is supported by nextpnr?23:19
daveshahNeither are supported by nextpnr23:20
corecodeaaah23:20
corecodeso what does it take to go from icestorm to nextpnr?23:20
daveshahJust looking at all the device specific cases and adding a new case23:21
daveshahe.g. Adding the database import from icestorm to CMake and adding the device name23:21
corecodethat sounds quite moderate23:21
daveshahMany of the cases will be the same as the up5k23:22
daveshahIt won't be much work at all23:23
corecodeoh i guess the IRDRV_BLOCK etc aren't even mapped?23:25
nats`https://code.electrolab.fr/nats/ice40up5k_base_project23:27
tpbTitle: nats / ice40up5k_base_project · GitLab (at code.electrolab.fr)23:27
nats`the up5k is supported by nextpnr23:27
nats`at least I made a small template to use it with yosys and nextpnr23:27
emebworks great23:28
corecodeman this is still going to be quite aways23:28
daveshahcorecode: no, you'll need to add support for those too23:28
daveshahYou can probably base that off the support for the UltraPlus RGB driver though23:28
corecodewhat happens if i don't add support?23:28
daveshahYou can't use that primitive23:29
corecode:D23:29
corecodefine23:29
daveshahEverything else will work fine23:29
corecodei want to just get my design going23:29
corecodeif icecube wouldn't fall on its face, i wouldn't be shaving this yak23:29
corecodehm LEDIP_BLOCK, RGBDRV_BLOCK, LEDDRVCUR_BLOCK, IRDRV_BLOCK23:36
corecodewhat's what now?23:36
daveshahLEDIP_BLOCK will be a PWM generator, basically just hard digital logic23:38
daveshahRGBDRV_BLOCK will be a 3 current constant driver for an RGB LED driver23:39
daveshahDon't know what LEDDRVCUR_BLOCK is, nothing like that on the UltraPlus23:39
daveshahIRDRV_BLOCK will be the IR driver23:39
daveshahThese should all have corresponding SB_ verilog primitives23:40
daveshahSee http://www.latticesemi.com/~/media/LatticeSemi/Documents/TechnicalBriefs/SBTICETechnologyLibrary201608.pdf23:42
emebThe Ultra parts had some sort of current source that had to be specifically hooked to the LED driver.23:42
emebUltra Plus doesn't need that.23:42
corecodei gotta say, this work is one of the least pleasurable things, and i've wasted a lot of time on obscure stuff23:46

Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!