*** tpb has joined #yosys | 00:00 | |
corecode | daveshah: sorry to keep asking you | 00:51 |
---|---|---|
corecode | how do i find the extra_bits_db (padin_glb_netwk) | 00:51 |
corecode | hm, is it as easy as reading this file? | 00:53 |
corecode | yea i see numbers that are very similar, but one off, like for the 1k | 00:55 |
*** seldridge has quit IRC | 01:28 | |
*** emeb has left #yosys | 01:38 | |
corecode | heh, icecube says Internal Error: Assumption 'arch->IsPlaceable(y, x)' failed in plGraphConverter.cpp line 1359 | 01:38 |
corecode | when placing a DFF at tile (7, 20) | 01:38 |
*** emeb_mac has joined #yosys | 01:42 | |
*** citypw has joined #yosys | 02:02 | |
*** gsi__ has joined #yosys | 02:12 | |
*** gsi_ has quit IRC | 02:15 | |
*** leviathanch has joined #yosys | 02:40 | |
*** seldridge has joined #yosys | 03:16 | |
*** rohitksingh has joined #yosys | 03:58 | |
*** seldridge has quit IRC | 04:06 | |
promach | ZipCPU: see https://i.stack.imgur.com/BQd2X.png | 04:22 |
*** pie___ has joined #yosys | 04:29 | |
*** rohitksingh has quit IRC | 04:31 | |
*** pie__ has quit IRC | 04:32 | |
*** jevinskie has joined #yosys | 04:39 | |
*** rohitksingh_work has joined #yosys | 04:41 | |
*** promach has quit IRC | 04:51 | |
*** promach has joined #yosys | 05:22 | |
*** m_w has quit IRC | 05:50 | |
*** develonepi3 has quit IRC | 07:03 | |
*** emeb_mac has quit IRC | 07:13 | |
*** rohitksingh_work has quit IRC | 07:55 | |
*** rohitksingh_work has joined #yosys | 07:58 | |
*** dys has quit IRC | 08:06 | |
*** awordnot has quit IRC | 08:17 | |
*** awordnot has joined #yosys | 08:19 | |
*** leviathanch has quit IRC | 08:42 | |
promach | ZipCPU: see the updated version https://i.imgur.com/oPAVWdh.png | 08:44 |
promach | wait, I just found another corner case :| | 09:04 |
promach | ZipCPU: I have just updated https://gist.github.com/promach/5f2d9a9494704ed93cf65687c982198c#file-multiply-v | 09:19 |
tpb | Title: A signed multiply verilog code using row adder tree multiplier and modified baugh-wooley algorithm · GitHub (at gist.github.com) | 09:19 |
daveshah | corecode: yeah, just try a SB_GB_IO (or internal oscillator, probably for two of the cases) at each global input location | 09:19 |
daveshah | and see which extra_bit appears in the asc file | 09:19 |
*** rohitksingh_work has quit IRC | 09:49 | |
*** rohitksingh_work has joined #yosys | 09:52 | |
*** citypw has quit IRC | 09:59 | |
sxpert | promach: image is gone | 10:12 |
promach | sxpert: see the gist, wait let me update the image | 10:14 |
sxpert | ah ok | 10:15 |
*** rohitksingh_work has quit IRC | 10:16 | |
promach | there is still bug when A_WIDTH != B_WIDTH | 10:24 |
sxpert | I see | 10:27 |
promach | sxpert: ok, it works now | 10:35 |
sxpert | the url probably changed | 10:37 |
promach | wait, let me update the code. give me 15 minutes | 10:37 |
tnt | I'm wondering if there are algorightm that specifically target LUT4 arch so that each layer uses the 4 inputs of the lut and not just 2. | 10:40 |
tnt | like doing the partial multiply 2 bits at a time instead of 1 bit at a time. | 10:41 |
*** leviathanch has joined #yosys | 10:44 | |
promach | sxpert : https://gist.github.com/promach/5f2d9a9494704ed93cf65687c982198c#file-multiply-v | 10:47 |
tpb | Title: A signed multiply verilog code using row adder tree multiplier and modified baugh-wooley algorithm · GitHub (at gist.github.com) | 10:47 |
promach | try it out and see what whether this meets your needs | 10:48 |
promach | and probably find out some corner cases that my assert() is not capable of finding | 10:48 |
promach | sxpert : just try it out first | 10:48 |
promach | probably I will need to solve the induction bugs before knowing which assert() I had missed | 10:51 |
sxpert | as ZipCPU would say, have you tried formal methods ? | 11:04 |
*** rohitksingh_work has joined #yosys | 11:04 | |
promach | sxpert : induction is part of yosys-smtbmc | 11:10 |
promach | and yosys-smtbmc is formal tool | 11:11 |
promach | and induction can help find bugs within the actual verilog source code itself as well as the formal code | 11:24 |
promach | sxpert : https://i.imgur.com/BLYZTi6.png should stay valid until induction shows me otherwise later | 11:31 |
*** s_frit has quit IRC | 11:32 | |
*** s_frit has joined #yosys | 11:33 | |
promach | use pencil and paper method to check this countermeasure | 11:33 |
promach | I cannot be 100 percent sure about the correctness of this countermeasure since I had not done a rigorous maths proof about this | 11:34 |
promach | and the code had not passed induction yet | 11:34 |
*** rohitksingh_work has quit IRC | 11:47 | |
corecode | yea i don't know what the colbufs are, so i don't know how to look for them in the files | 11:55 |
*** s_frit has quit IRC | 12:10 | |
*** s_frit has joined #yosys | 12:11 | |
*** jevinskie has quit IRC | 12:15 | |
tnt | corecode: AFAIR they're buffer to distribute global networks in various parts of the chips and if some global isn't needed in some area, you can disable those buffer to save power. | 12:42 |
tnt | and I think nextpnr currently just globally enables them all indiscriminately if they're actually needed or not. | 12:42 |
*** s_frit has quit IRC | 13:19 | |
*** s_frit has joined #yosys | 13:20 | |
*** rohitksingh has joined #yosys | 13:26 | |
*** jevinskie has joined #yosys | 14:23 | |
*** citypw has joined #yosys | 14:41 | |
*** seldridge has joined #yosys | 14:56 | |
*** jevinskie has quit IRC | 15:26 | |
*** maikmerten has joined #yosys | 16:36 | |
*** jevinskie has joined #yosys | 16:38 | |
*** develonepi3 has joined #yosys | 16:39 | |
*** gsi__ is now known as gsi_ | 17:02 | |
*** rohitksingh has quit IRC | 17:04 | |
*** gruetzkopf has quit IRC | 17:13 | |
*** kerel has quit IRC | 17:13 | |
*** gruetzkopf has joined #yosys | 17:14 | |
develonepi3 | daveshah: I have seen your presentation several times and have enjoyed it very much. You comments that "Most FPGA Development use closed-source tools, FPGA vendors don;t document bitstreams." Are right on point. Yourself & others ZipCPU, & Clifford Wolfe have advanced FPGA discipline of study more in the past few years than others in decades. I think that we are now at cusp where more people will start using FPGAs. I have been working in | 17:15 |
develonepi3 | Compressing Numerical Meteorological Modeled Data for many years. This work Karhunen-Loeve transform (KLT) in the vertical direction and JPEG 2000 on XY slices has been abandoned. I recently started working on Bare Metal for the Raspberry Pi3B+ using Ultibo. I think this is now achievable with your ECP5 efforts and the Raspberry Pi3B+ running Bare-Metal. | 17:15 |
*** kerel has joined #yosys | 17:15 | |
*** m4ssi has quit IRC | 17:35 | |
*** jevinskie has quit IRC | 17:37 | |
*** rohitksingh has joined #yosys | 17:40 | |
corecode | i doubt you'd see a performance improvement for compute between running linux or ultibo | 17:46 |
ZipCPU | corecode: I'm curious why you'd say that | 17:50 |
*** develonepi3 has quit IRC | 18:24 | |
corecode | ZipCPU: because typically compute means that there is no kernel executing for most of the time | 18:28 |
*** emeb has joined #yosys | 18:35 | |
*** dys has joined #yosys | 18:39 | |
*** develonepi3 has joined #yosys | 18:41 | |
ZipCPU | ... and, go on | 18:43 |
corecode | given that the kernel doesn't run much at all, it is unlikely that you see performance differences | 18:44 |
ZipCPU | Sorry, I guess I misread your response. You meant between Linux and Ultibo, and I thought you meant (Linux and Ultibo) vs FPGA | 18:45 |
corecode | oh no | 18:45 |
daveshah | FWIW, if this is floating point heavy then there's little chance of the ECP5 beating the Pi, you'd probably need something much fancier, unless you are very clever about how you describe it | 18:47 |
corecode | hi daveshah | 18:47 |
daveshah | otoh I can easily see the ECP5 winning if you get it fixedpoint/integer | 18:47 |
daveshah | hi corecode! | 18:47 |
corecode | you're just the guy i was looking for | 18:47 |
corecode | i'm trying to button up this icestorm stuff | 18:47 |
ZipCPU | daveshah: When I last examined the algorithm, it was I/O (i.e. SDRAM) bound | 18:47 |
corecode | what am i looking for in the colbuf_logic output? | 18:48 |
corecode | because i got it running (except for one tile) | 18:48 |
corecode | but now i don't know what i am looking for | 18:48 |
daveshah | You should see 4-tuples (colbuf_x, colbuf_y, user_x, user_y)? | 18:48 |
daveshah | Hopefully colbuf_x and user_x are the same | 18:49 |
corecode | no, that must be a different script | 18:49 |
corecode | you mean colbuf_io? | 18:49 |
corecode | there are 3 different colbuf scripts | 18:49 |
daveshah | ah, I think you might need to use colbuf.py to parse the colbuf_logic output | 18:49 |
daveshah | ie, pass all the .exp files created by colbuf_logic to colbuf.py | 18:50 |
corecode | aha | 18:50 |
corecode | last time i used it, i got assertion errors | 18:51 |
corecode | because icebox is missing some data | 18:51 |
corecode | this is a big maze | 18:51 |
corecode | what are those colbufs? | 18:51 |
daveshah | basically, the global network is split up into segments to save power | 18:52 |
daveshah | the colbufs are the buffers for a given line segment | 18:52 |
daveshah | there's an illustration at http://www.clifford.at/icestorm/io_tile.html | 18:52 |
tpb | Title: Project IceStorm IO Tile Documentation (at www.clifford.at) | 18:52 |
corecode | aaah! more documentation i didn't know about | 18:53 |
daveshah | the grey circles are the column buffers and the red lines indicate the tiles in which globals are driven by that buffer | 18:53 |
daveshah | there's a script to generate an svg like that somewhere too | 18:53 |
corecode | yea | 18:54 |
*** tannewt has quit IRC | 19:14 | |
*** tannewt has joined #yosys | 19:16 | |
*** rohitksingh has quit IRC | 19:30 | |
corecode | daveshah: aha! colbuf_io*.sh does not produce output with a ColBufCtrl line | 19:32 |
corecode | daveshah: so what's going on there | 19:32 |
daveshah | corecode: quite possible that the lm4k doesn't have IO colbufs (i.e. they are enabled all the time) | 19:33 |
corecode | does this chip not have ColBufCtrl in IO cells? | 19:33 |
corecode | and what the hell is up with (7,20), why did the icecube placer throw an internal assert | 19:34 |
daveshah | not something I've ever seen | 19:34 |
daveshah | perhaps that tile is broken | 19:34 |
corecode | that they noticed later? | 19:35 |
daveshah | within the realms of possibility | 19:36 |
corecode | so from reading the code in icebox, it seems that the other dies have e.g. the bottom IO tile connected to the colbuf | 19:37 |
corecode | because y==0 also maps to col buf source y=4 | 19:37 |
corecode | but somehow when running the colbuf_io script, i don't see any colbufctrl - what does that mean? | 19:38 |
corecode | the signal has to be routed somehow? | 19:38 |
corecode | or they always route it directly, and not with a colbuf? | 19:38 |
corecode | why would this aspect be so different from the 5k | 19:39 |
daveshah | my guess is that it is always routed | 19:42 |
corecode | wait, maybe the problem is a different one | 19:44 |
corecode | so if i understand the colbuf_io code right, it sets up an IO cell that uses a clk, therefore a global network | 19:45 |
daveshah | yeah | 19:45 |
corecode | the input clock (from a different pin) will have to be routed via the global network | 19:45 |
corecode | or? | 19:45 |
corecode | because what i'm seeing is that the clk signal is routed via standard routing | 19:46 |
daveshah | ah, that explains that one | 19:46 |
daveshah | modify the example code to put an SB_GB in between clock pin and SB_IO | 19:46 |
daveshah | eg SB_GB gbuf ( | 19:47 |
daveshah | .USER_SIGNAL_TO_GLOBAL_BUFFER(clk_in), | 19:47 |
daveshah | .GLOBAL_BUFFER_OUTPUT(clk) | 19:47 |
daveshah | ); | 19:47 |
corecode | the ram code does that | 19:48 |
corecode | interesting that it worked for others? | 19:48 |
corecode | i'm also using the newer icecube, so maybe that's a difference | 19:48 |
daveshah | yes, quite possibly an icecube change | 19:49 |
corecode | thanks | 19:49 |
corecode | knowing what to look for really helps :) | 19:49 |
corecode | so how do i get the numbers for the extra_bits? | 19:50 |
corecode | there are some comments i don't quite get | 19:50 |
corecode | e.g. | 19:50 |
corecode | (1, 331, 143): ("padin_glb_netwk", "3"), # (1 3) (331 144) (331 144) routing T_0_0.padin_3 <X> T_0_0.glb_netwk_3 | 19:50 |
corecode | 19:50 | |
corecode | where does the first tuple come from? | 19:50 |
daveshah | That comment is the text description of the bit from the GLB file | 19:51 |
daveshah | the first tuple comes from the .extra_bit in the asc or exp | 19:51 |
corecode | if it said (1, 331, 144) i'd understand it | 19:51 |
daveshah | There's an offset for some strange reason | 19:51 |
corecode | for some only? | 19:51 |
daveshah | I can't remember the specifics | 19:51 |
corecode | ok | 19:51 |
daveshah | So long as the first tuple comes from the asc/exp it should be fine | 19:52 |
corecode | so those i just need to place a global input and observe what extra bits are being set | 19:52 |
daveshah | Yes | 19:52 |
daveshah | in some cases, it might be an oscillator rather than a global input | 19:52 |
corecode | yes | 19:52 |
corecode | i guess i know what global network it is | 19:53 |
corecode | and gbufin locations i need as well, but i guess i should get that with the same test | 19:54 |
daveshah | I think the datasheet should have those | 19:54 |
daveshah | beware the gbufin locations in the datasheet are for global input pins | 19:54 |
daveshah | SB_GBs which drive from fabric are at the same locations, but drive a different network to the pin at that location | 19:55 |
corecode | that's also related to the padin_pio_db? | 19:55 |
daveshah | Yes, those are the input pin that drive each global | 19:56 |
corecode | it seems some dbs require a specific sequence, and i don't know what the sequence needs to match | 19:56 |
daveshah | padin_pio_db is in global network number order | 19:56 |
corecode | i'm not sure which terminology is what | 19:57 |
daveshah | "padin" refers to the dedicated route from a specific IO pin to a specific global network | 19:59 |
daveshah | "gbufin" refers to the route from fabric (the fabout into an IO tile) into a global network | 20:00 |
*** m4ssi has joined #yosys | 20:09 | |
*** m4ssi has quit IRC | 20:09 | |
*** m4ssi has joined #yosys | 20:31 | |
corecode | yea that was it, with the extra SB_GB the io tiles could be tested too | 20:44 |
*** m_w has joined #yosys | 20:50 | |
*** maikmerten has quit IRC | 20:57 | |
*** leviathanch has quit IRC | 21:00 | |
*** m4ssi has quit IRC | 21:22 | |
*** FL4SHK has joined #yosys | 22:02 | |
*** indy has quit IRC | 22:06 | |
*** indy has joined #yosys | 22:10 | |
*** indy has quit IRC | 22:37 | |
*** show1 has joined #yosys | 22:39 | |
*** indy has joined #yosys | 22:43 | |
corecode | i'm surprised there is no automation for the global network stuff | 22:57 |
corecode | or i am missing it | 22:57 |
daveshah | I don't think there is anything | 22:58 |
daveshah | As it is only 8 globals per device I don't think anyone bothered | 22:59 |
corecode | ok | 22:59 |
corecode | i guess i need to create a different footprint option to capture all information | 22:59 |
daveshah | Yes, quite possibly | 23:00 |
corecode | oh the pllauto script is fantastic | 23:06 |
daveshah | I did that one when doing the UltraPlus | 23:07 |
daveshah | It doesn't do routing, but that is pretty quick to figure out with icebox_vlog (and only needs one design) | 23:07 |
corecode | the ultralite is very similar | 23:08 |
corecode | i | 23:08 |
corecode | i'm just trying to verify that the values are the same | 23:08 |
daveshah | Yeah, very sensible | 23:08 |
corecode | i think the bit assignments are the same, pins/cells are different | 23:08 |
corecode | possibly the special different bits are different as well | 23:09 |
daveshah | The UltraPlus had a strange layout bitstream wise where one "half" was twice the height of the other | 23:10 |
corecode | and what do these extra bits do? | 23:11 |
daveshah | The 8 padin ones? | 23:11 |
corecode | no, the extra height ones | 23:14 |
daveshah | Oh, they are used to go from 3520 LUTs in the Ultra to 5280 in the UltraPlus | 23:15 |
corecode | ah | 23:15 |
corecode | is the lm4k the ultra? | 23:15 |
daveshah | No, that's the iCE40LM which is older | 23:16 |
daveshah | iCE5LP is Ultra | 23:16 |
corecode | what device string is that in icebox? | 23:16 |
corecode | 5k is ultraplus | 23:17 |
daveshah | lm4k is LM | 23:17 |
daveshah | Ultra isn't in icebox | 23:17 |
daveshah | iceunpack uses u4k for Ultra | 23:18 |
corecode | ah! | 23:19 |
corecode | but it is supported by nextpnr? | 23:19 |
daveshah | Neither are supported by nextpnr | 23:20 |
corecode | aaah | 23:20 |
corecode | so what does it take to go from icestorm to nextpnr? | 23:20 |
daveshah | Just looking at all the device specific cases and adding a new case | 23:21 |
daveshah | e.g. Adding the database import from icestorm to CMake and adding the device name | 23:21 |
corecode | that sounds quite moderate | 23:21 |
daveshah | Many of the cases will be the same as the up5k | 23:22 |
daveshah | It won't be much work at all | 23:23 |
corecode | oh i guess the IRDRV_BLOCK etc aren't even mapped? | 23:25 |
nats` | https://code.electrolab.fr/nats/ice40up5k_base_project | 23:27 |
tpb | Title: nats / ice40up5k_base_project · GitLab (at code.electrolab.fr) | 23:27 |
nats` | the up5k is supported by nextpnr | 23:27 |
nats` | at least I made a small template to use it with yosys and nextpnr | 23:27 |
emeb | works great | 23:28 |
corecode | man this is still going to be quite aways | 23:28 |
daveshah | corecode: no, you'll need to add support for those too | 23:28 |
daveshah | You can probably base that off the support for the UltraPlus RGB driver though | 23:28 |
corecode | what happens if i don't add support? | 23:28 |
daveshah | You can't use that primitive | 23:29 |
corecode | :D | 23:29 |
corecode | fine | 23:29 |
daveshah | Everything else will work fine | 23:29 |
corecode | i want to just get my design going | 23:29 |
corecode | if icecube wouldn't fall on its face, i wouldn't be shaving this yak | 23:29 |
corecode | hm LEDIP_BLOCK, RGBDRV_BLOCK, LEDDRVCUR_BLOCK, IRDRV_BLOCK | 23:36 |
corecode | what's what now? | 23:36 |
daveshah | LEDIP_BLOCK will be a PWM generator, basically just hard digital logic | 23:38 |
daveshah | RGBDRV_BLOCK will be a 3 current constant driver for an RGB LED driver | 23:39 |
daveshah | Don't know what LEDDRVCUR_BLOCK is, nothing like that on the UltraPlus | 23:39 |
daveshah | IRDRV_BLOCK will be the IR driver | 23:39 |
daveshah | These should all have corresponding SB_ verilog primitives | 23:40 |
daveshah | See http://www.latticesemi.com/~/media/LatticeSemi/Documents/TechnicalBriefs/SBTICETechnologyLibrary201608.pdf | 23:42 |
emeb | The Ultra parts had some sort of current source that had to be specifically hooked to the LED driver. | 23:42 |
emeb | Ultra Plus doesn't need that. | 23:42 |
corecode | i gotta say, this work is one of the least pleasurable things, and i've wasted a lot of time on obscure stuff | 23:46 |
Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!