*** tpb has joined #yosys | 00:00 | |
*** dh73 has quit IRC | 00:06 | |
*** rohitksingh has quit IRC | 00:10 | |
*** emeb has quit IRC | 00:26 | |
*** X-Scale has quit IRC | 01:03 | |
*** X-Scale` has joined #yosys | 01:04 | |
*** X-Scale` is now known as X-Scale | 01:05 | |
*** rohitksingh has joined #yosys | 01:45 | |
*** kraiskil has quit IRC | 01:56 | |
*** citypw has joined #yosys | 03:41 | |
*** nrossi has joined #yosys | 03:49 | |
*** dnotq has quit IRC | 03:52 | |
*** rektide has quit IRC | 03:56 | |
*** dys has quit IRC | 06:51 | |
*** dys has joined #yosys | 07:31 | |
*** Jybz has joined #yosys | 08:02 | |
*** dys has quit IRC | 08:10 | |
*** freemint has joined #yosys | 08:10 | |
*** freemint has quit IRC | 08:15 | |
pepijndevos | Which operations does $alu implement? It seems just addition and subtraction? | 08:30 |
---|---|---|
pepijndevos | A building block supporting both binary addition/subtraction operations, and | 08:31 |
pepijndevos | indirectly, comparison operations. | 08:31 |
whitequark | yes | 08:32 |
pepijndevos | indirectly?? Meaning it wraps a subtract in some logic to do comparison? | 08:32 |
whitequark | I think so yes | 08:32 |
pepijndevos | What I also don't get is the X output which always just seems to be the XOR of the inputs | 08:33 |
whitequark | yes, that's what it is | 08:34 |
whitequark | take a look at the circuit for a full adder | 08:34 |
pepijndevos | But is there any case where this XOR is actually part of the adder hardware? At least in all implementations I've seen it's literally an XOR, not part of the vendor ALU primitive. | 08:37 |
*** kraiskil has joined #yosys | 08:37 | |
whitequark | no idea | 08:38 |
pepijndevos | ok | 08:38 |
whitequark | not on FPGAs for sure | 08:38 |
pepijndevos | Actually, I'm not sure what you mean with the full adder, because the ALU already has a carry in and carry out too, right? | 08:39 |
mwk | pepijndevos: I don't think I understand your question really | 08:39 |
mwk | the thing about FPGAs is, they don't really have alu cells | 08:40 |
mwk | (except the ones in DSP slices, but they're a completely different thing) | 08:40 |
pepijndevos | Right, just a LUT with a hardware carry | 08:41 |
mwk | what FPGAs really do is reuse as much of the LUT as possible, and have the bare minimum of special carry chain logic | 08:41 |
mwk | and maybe an extra XOR gate (for xilinx) | 08:41 |
pepijndevos | hmmm okay | 08:42 |
pepijndevos | So what I'm thinking about is... the Gowin ALU supports special comparison modes, but there does not seem to be a straightforward way to map an $alu cell to them as the comparison is "indirect" | 08:45 |
daveshah | Realistically they'll be just setting the LUT init to some value | 08:45 |
pepijndevos | Right, but if Yosys just sets them to add/sub and then wraps extra logic around them for comparison, it could be less than ideal I suppose. | 08:46 |
daveshah | I would be careful creating too much hard logic for comparison as it might prevent optimisation down the line | 08:47 |
pepijndevos | But then, IIRC I did not actually manage to make the vendor tools issue some of these modes, so maybe they are actually the same LUT contents. | 08:47 |
daveshah | Older Lattice FPGAs had these modes too | 08:47 |
daveshah | But they definitely didn't have any special hardware behind them beyond the LUT and carry logic | 08:47 |
pepijndevos | Right | 08:47 |
mwk | heh | 08:47 |
mwk | $lt/$gt could actually be implemented in real clever ways on xilinx | 08:48 |
mwk | using the carry chain | 08:48 |
mwk | too bad we don't do it | 08:48 |
daveshah | Yeah although the usual issues with current mapping and optimisation that mean that sort of stuff can deny optimisations elsewhere | 08:49 |
mwk | for comparison between two variables, you can do two bits per LUT; for comparison between a variable and an arbitrary constant, five bits per LUT | 08:49 |
mwk | right :( | 08:49 |
lukego | Added the yosys "show" GUI dependencies to the nixpkgs package if anybody is interested. was a little fiddly with icons. as a separate package to avoid pulling it all in with the compiler. https://github.com/NixOS/nixpkgs/pull/73856 | 09:26 |
tpb | Title: yosys-withGui: New wrapper derivation with GUI runtime dependencies by lukego · Pull Request #73856 · NixOS/nixpkgs · GitHub (at github.com) | 09:26 |
lukego | newbie alert: How do I make a ROM at the RTLIL level? I feel like $memrd is the right primitive but I don't see how to supply an initial value. | 09:38 |
daveshah | lukego: $meminit | 09:39 |
lukego | hm the .tex docs on master seem to talk more about meminit than http://www.clifford.at/yosys/files/yosys_manual.pdf | 09:41 |
daveshah | They might have been improved since the 0.9 release, when that was probably generated | 09:42 |
lukego | Is there a good place to look for RTLIL examples to help interpret the docs? Or source is the best bet? | 09:43 |
daveshah | Write some Verilog and look at the RTLIL it produces | 09:44 |
lukego | I thought the whole point of RTLIL is that it lets me avoid learning Verilog ;-) | 09:44 |
daveshah | RTLIL is not intended to be human writeable outside of testing | 09:45 |
lukego | (I started with some Verilog code that did ROM-like things as a "switch" statement but that didn't get mapped onto memory, just logic) | 09:45 |
daveshah | You would want to use an initial block or $readmem[hb] to get $meminit cells | 09:45 |
lukego | I'm hoping that RTLIL can serve as a foundation like FIRRTL i.e. provide "simple as possible but no simpler" abstractions to build on top of. Like an instruction set so to speak. Long term goal is not to write code by hand. | 09:47 |
lukego | So I'm looking at ilang as the "main thing" here and the Verilog frontend as some esoterica. | 09:47 |
lukego | but thanks I will chase these refs in the source :) | 09:47 |
daveshah | The main reason not to hand write rtlil is a lack of parameterisation or generate-for type structuess | 09:48 |
daveshah | Because it only exists after the first parts of elaboration are done | 09:49 |
*** d__ has quit IRC | 09:49 | |
lukego | Right. Maybe it's easiest to say that my goal is to write a frontend here, albeit I don't know much about how that will look yet, and so I'm trying to understand the "mid-end" for now | 09:49 |
daveshah | I see | 09:49 |
lukego | I'm thinking I'll start with RTLIL and add basic macro-like abstractions on top of that, and see how far that gets me as a learning exercise. | 09:50 |
lukego | but I take the point not to expect a large corpus of hand-written RTLIL example programs | 09:50 |
*** d__ has joined #yosys | 09:51 | |
emily | lukego: nmigen is an existing compiles-to-RTLIL eDSL | 09:54 |
emily | https://github.com/m-labs/nmigen | 09:54 |
tpb | Title: GitHub - m-labs/nmigen: A refreshed Python toolbox for building complex digital hardware (at github.com) | 09:54 |
emily | you can probably learn whatever you want to know from its source code | 09:54 |
lukego | oh neat thanks emily. I know of nmigen but I didn't know it targets RTLIL. | 09:55 |
emily | https://github.com/m-labs/nmigen/blob/master/nmigen/back/rtlil.py | 09:55 |
tpb | Title: nmigen/rtlil.py at master · m-labs/nmigen · GitHub (at github.com) | 09:55 |
*** kraiskil has quit IRC | 10:17 | |
*** rohitksingh has quit IRC | 10:19 | |
*** rohitksingh has joined #yosys | 10:21 | |
*** rohitksingh_ has joined #yosys | 10:26 | |
*** rohitksingh has quit IRC | 10:27 | |
lukego | Oh hey :-) Suppose that I wanted to get my hands on a high-end FPGA like a Kintex UltraScale+. How do hacker types usually do that? Seems like buying it off the shelf for list price is not the answer | 10:32 |
daveshah | Have a look for used bitcoin miner boards? | 10:34 |
daveshah | vcu1525 | 10:34 |
daveshah | and derivatives | 10:34 |
daveshah | Or just use AWS F1 if you don't want to do any custom IO interfacing | 10:34 |
lukego | I'm mostly intersted in I/O. My dream setup right now would be an AMD Threadripper with all of its PCIe lanes connected to FPGA(s) | 10:35 |
lukego | So that's ~100 PCIe gen4 | 10:35 |
daveshah | Well if you only care about PCIe then AWS F1 might work | 10:35 |
daveshah | But it's nowhere near that many lanes | 10:36 |
daveshah | Something like x4 or x8 iirc | 10:36 |
lukego | oh hm that's a really good point thanks! | 10:36 |
daveshah | You can pay more for a box with something like 8 FPGA cards too | 10:36 |
lukego | I'm really interested in networking applications where the FPGA would be hooked up to a bunch of 100GbE ports, but I think it's reasonable to treat that as a separate problem i.e. do the PCIe I/O and the Ethernet I/O work separately. | 10:37 |
lukego | That sounds extremely promising. I'd like to do things like benchmark the CPU from the PCIe side to see where the pain ports are on the uncore/cache/ram | 10:37 |
lukego | and an EC2 machine with 8xFPGA is probably a perfect solution for 0.1% the cost of buying the equipment for a 10 minute test. | 10:38 |
daveshah | The one with 8 cards is $13/hr | 10:38 |
daveshah | Even if you used it 24/7 it would be good value compared to buying in low qty | 10:39 |
daveshah | Such is the discount that Amazon gets | 10:39 |
lukego | So even cheaper because I don't think you get that hardware for $13K :) | 10:39 |
lukego | Yeah I've been told privately that I would faint if I saw what Amazon pay for FPGAs. | 10:39 |
daveshah | I would guess it is about 10-20x cheaper than DigiKey list | 10:40 |
lukego | Seems like an anti-hacker conspiracy when BigCo and universities all get 10x discount but oh well | 10:40 |
lukego | I suppose that if you found the right approach to Xilinx etc they would shower you with hardware e.g. if you are consulting on a project for one of their big customers/prospects. | 10:41 |
lukego | but it's exhausting to play that game :) | 10:41 |
daveshah | Even iCE40s have at least a 5x discount in 1e6 quantity | 10:41 |
daveshah | I've been there before and it's easier said than done | 10:41 |
lukego | Quantity sure makes a difference. I wanted to buy an ECP5 based NIC but it costs like $600 and that just seems like a lot compared with the FPGA, even though I know that's the reality of producing the boards for a really low quantity market. | 10:42 |
daveshah | The first quotation I saw as a small company in the UK a few years ago for 1k Zynqs was above DigiKey single pricing | 10:42 |
daveshah | For the smaller parts it seems like 100k is where it gets good | 10:43 |
lukego | thanks muchly for the tips | 10:43 |
lukego | lunch time here | 10:43 |
daveshah | Enjoy | 10:45 |
lukego | So follow-up question if I may :-) Supposing I would ultimately aim for EC2 as a high-end FPGA env. How might I prototype in a meaningful way e.g. with affordable hardware that I can use as a proxy for the real thing? And could I use Yosys/nextpnr for F1 or is that a different universe? | 11:13 |
mwk | yosys, yes; nextpnr, no | 11:18 |
sorear | you'll likely have a rather small number of pins you can connect to pcie lanes per fpga | 11:19 |
mwk | that too | 11:20 |
mwk | unless you get one of the outrageously expensive ones | 11:20 |
sorear | I haven't run the numbers for ultrascale+ but on ecp5-5G the aggregate throughput of the "slow" I/Os is about 6x the aggregate throughput of the fast IOs/SERDESes because there are so many more of the former | 11:20 |
daveshah | well nextpnr and rapidwright is a possibility for ec2 | 11:31 |
daveshah | Given it needs a dcp rather than a bitstream anyway | 11:31 |
daveshah | But the router can't handle such big designs on a Xilinx arch at the moment | 11:31 |
daveshah | Some hacking of pblocks would also be needed to integrate with the fixed ec2 logic | 11:32 |
lukego | Seems like F1 instance FPGAs have PCIe gen3 x16 each. That would be 1Tbps of total PCIe bandwidth on a server with 8x FPGAs. Guessing that's NUMA so 0.5Tbps per node. Respectable. Even better if PCIe gen4 comes and doubles that. | 11:32 |
lukego | Hopefully you get full control of the pins attached to PCIe? e.g. to run some small custom PCIe DMA load generator. | 11:33 |
daveshah | I'm not sure, I know there is some logic that they provide | 11:34 |
sorear | no | 11:36 |
lukego | I suppose that being familiar with Amazon F1 doesn't hurt in itself because whoever uses that in production probably has deep pockets. | 11:36 |
sorear | there's a "shell" which includes all of the pcie bits, you get an AXI4 interface | 11:36 |
lukego | gack | 11:36 |
lukego | well, maybe fine, do you think it limits performance? | 11:36 |
sorear | I haven't actually tried but I doubt it's a bottleneck for streaming workloads | 11:37 |
lukego | Seems to be the spec https://github.com/aws/aws-fpga/blob/master/hdk/docs/AWS_Shell_Interface_Specification.md | 11:37 |
tpb | Title: aws-fpga/AWS_Shell_Interface_Specification.md at master · aws/aws-fpga · GitHub (at github.com) | 11:37 |
lukego | I'm in the packet networking domain so I'd be using the FPGAs as a proxy for NICs. Like, how much network traffic can an application keep up with, including all the messy bottlenecks like PCIe/RAM/NUMA. But also interested in seeing what can be done directly on the FPGAs. | 11:38 |
sorear | "full control of the pins attached to pcie" sounds like it would have fun security implications for f1.2xlarge | 11:39 |
daveshah | Yeah | 11:40 |
lukego | Maybe it's a blessing to have a simplified interface really. Just provided it's not de-optimized for the specific intended purpose. always a risk when benchmarking and looking for fundamental limits. | 11:43 |
lukego | I suppose that for dev purposes you would want a board with an equivalent PCIe-AXI4 bridge? Is that something straightforward? | 11:46 |
daveshah | The Xilinx Alveo U200/U250 are probably the closest to that | 11:47 |
lukego | not so easy on ECP5? | 11:47 |
daveshah | afaik noone has done full open PCIe on ECP5 | 11:47 |
lukego | Maybe it's worth being in the Xilinx universe when developing for F1 anyway | 11:47 |
daveshah | I suspect you will hit timing and utilisation limits on ECP5 quickly | 11:48 |
daveshah | for what you are doing | 11:48 |
lukego | Seems like ~$5K for those cards. Alternative might be to just develop against a dummy host / AXI4 stub? | 11:49 |
lukego | But... quite a lot of ground to cover before this becomes relevant. I have soldered a CAT5 cable onto my TinyFPGA BX and I don't plan to buy any more hardware until I've successfully spammed out 10BaseT ethernet packets from that. Just stubbornly want to write the packet generator in RTLIL :) | 11:50 |
daveshah | Sounds like fun | 11:51 |
lukego | (And I must remember that my most ambitious FPGA designs to date have been variations on "blink this LED" and I'll now come facee to face with my own incompetence :-)) | 11:51 |
lukego | There's a neat dirty ethernet hack that I found here https://www.fpga4fun.com/10BASE-T0.html but I didn't like that their hard-coded packet seems to be synthesied as logic instead of ROM :) | 11:52 |
tpb | Title: fpga4fun.com - 10BASE-T FPGA interface 0 - A recipe to send Ethernet traffic (at www.fpga4fun.com) | 11:52 |
ZipCPU | 10BaseT? Why not 100BaseT? Or GbE? | 11:54 |
lukego | iCE40 with no SerDes | 11:54 |
lukego | This is a gating example. I'll allow myself to buy an ECP5-Versa board with 1G after I make this work :) | 11:54 |
*** kraiskil has joined #yosys | 11:55 | |
ZipCPU | 100BaseT should still work ... RMII uses four wires ganged together at 25 Mhz. Shouldn't be much of a problem, and might be easier to find the hardware for. | 11:55 |
lukego | I also think it's a bit cheeky to be doing ethernet using generic I/O pins and maybe it's easier to get away with that at lower speeds? There's no PHY so no RMII just wires soldered between the cable and the FPGA | 11:56 |
lukego | (maybe I should have emphasised "dirty" over "neat" in my description ;-)) | 11:57 |
* ZipCPU shudders at the thought of not using a PHY | 11:57 | |
lukego | well it only needs to send one packet and I'll be happy. I'll put a UDP envelope that can get it routed to the other side of the world and that will make it seem like an accomplishment :) | 11:58 |
ZipCPU | Only if you can verify that the task was accomplished | 11:58 |
ZipCPU | ICMP is often easier for that purpose | 11:58 |
ZipCPU | Don't forget, though, that you can't do that without also supporting ARP .... :O | 11:58 |
lukego | I'm prepared to be dirty on this level :-) I'll skip ARP by hardcoding the DMAC and I'll receive the packet using tcpdump so it doesn't matter if the receiver lkes it or not :) | 11:59 |
lukego | I have almost no experience with FPGAs but I have spent most of my adult life doing inadvisable things with IP networks :) | 12:00 |
lukego | btw love your blog! once I have my feet wet I am planning to go carefully through your intro to formal posts | 12:01 |
ZipCPU | Awesome! | 12:01 |
lukego | actually thanks for the RMII suggestion btw! maybe the next baby-step project for me would be to buy a 100M PHY and hook that up to my breadboard. That could be an intermediate step between this 10M hack and the ECP5 1G board. | 12:03 |
lukego | Maybe I could even scavenge a PHY from a derelict circuit board lying around here somewhere.. | 12:04 |
lukego | I have splurged on a new soldering iron that will arrive next week so I'm itching for such a project. I decided that I've gotten my money out of the ~$5 aliexpress one now. | 12:04 |
*** fsasm has joined #yosys | 12:36 | |
*** kraiskil has quit IRC | 13:51 | |
*** rohitksingh_ has quit IRC | 14:00 | |
*** fsasm has quit IRC | 14:04 | |
*** citypw has quit IRC | 15:31 | |
*** dh73 has joined #yosys | 15:51 | |
*** dys has joined #yosys | 15:54 | |
*** bwidawsk has quit IRC | 16:00 | |
*** dys has quit IRC | 16:00 | |
*** bwidawsk has joined #yosys | 16:05 | |
*** rohitksingh has joined #yosys | 16:06 | |
*** kraiskil has joined #yosys | 16:13 | |
*** rohitksingh has quit IRC | 16:14 | |
*** emeb has joined #yosys | 16:17 | |
*** d__ has quit IRC | 17:16 | |
*** bobzoidting has joined #yosys | 17:26 | |
*** ravenexp has quit IRC | 17:30 | |
*** dh73 has quit IRC | 18:14 | |
*** dys has joined #yosys | 18:25 | |
*** dys has quit IRC | 18:53 | |
*** dys has joined #yosys | 19:17 | |
*** dh73 has joined #yosys | 19:18 | |
emily | lukego: btw, I used to read your blog a bunch many years ago, so belated thanks from a fan of obscure languages and environments :) | 20:06 |
*** Jybz has quit IRC | 20:28 | |
*** nrossi has quit IRC | 20:44 | |
*** X-Scale` has joined #yosys | 21:08 | |
*** pie_ has joined #yosys | 21:09 | |
*** Ekho has quit IRC | 21:09 | |
*** indy_ has joined #yosys | 21:10 | |
*** indy has quit IRC | 21:12 | |
*** pie__ has quit IRC | 21:12 | |
*** X-Scale has quit IRC | 21:12 | |
*** ZipCPU has quit IRC | 21:12 | |
*** turq has quit IRC | 21:12 | |
*** fengling has quit IRC | 21:12 | |
*** X-Scale` is now known as X-Scale | 21:12 | |
*** ZipCPU has joined #yosys | 21:14 | |
*** fengling has joined #yosys | 21:15 | |
*** Ekho has joined #yosys | 21:18 | |
*** bobzoidting has quit IRC | 21:38 | |
*** svenn4 has joined #yosys | 22:12 | |
*** rohitksingh has joined #yosys | 22:23 | |
*** svenn4 has quit IRC | 22:30 | |
*** svenn4 has joined #yosys | 22:30 | |
whitequark | 09:40 < lukego> hm the .tex docs on master seem to talk more about meminit than http://www.clifford.at/yosys/files/yosys_manual.pdf | 22:31 |
whitequark | I got frustrated with the sparse docs and updated them :) | 22:31 |
*** ktemkin has quit IRC | 23:08 | |
*** snajpa has left #yosys | 23:32 |
Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!