Thursday, 2019-11-21

*** tpb has joined #yosys00:00
*** dh73 has quit IRC00:06
*** rohitksingh has quit IRC00:10
*** emeb has quit IRC00:26
*** X-Scale has quit IRC01:03
*** X-Scale` has joined #yosys01:04
*** X-Scale` is now known as X-Scale01:05
*** rohitksingh has joined #yosys01:45
*** kraiskil has quit IRC01:56
*** citypw has joined #yosys03:41
*** nrossi has joined #yosys03:49
*** dnotq has quit IRC03:52
*** rektide has quit IRC03:56
*** dys has quit IRC06:51
*** dys has joined #yosys07:31
*** Jybz has joined #yosys08:02
*** dys has quit IRC08:10
*** freemint has joined #yosys08:10
*** freemint has quit IRC08:15
pepijndevosWhich operations does $alu implement? It seems just addition and subtraction?08:30
pepijndevosA building block supporting both binary addition/subtraction operations, and08:31
pepijndevosindirectly, comparison operations.08:31
whitequarkyes08:32
pepijndevosindirectly?? Meaning it wraps a subtract in some logic to do comparison?08:32
whitequarkI think so yes08:32
pepijndevosWhat I also don't get is the X output which always just seems to be the XOR of the inputs08:33
whitequarkyes, that's what it is08:34
whitequarktake a look at the circuit for a full adder08:34
pepijndevosBut is there any case where this XOR is actually part of the adder hardware? At least in all implementations I've seen it's literally an XOR, not part of the vendor ALU primitive.08:37
*** kraiskil has joined #yosys08:37
whitequarkno idea08:38
pepijndevosok08:38
whitequarknot on FPGAs for sure08:38
pepijndevosActually, I'm not sure what you mean with the full adder, because the ALU already has a carry in and carry out too, right?08:39
mwkpepijndevos: I don't think I understand your question really08:39
mwkthe thing about FPGAs is, they don't really have alu cells08:40
mwk(except the ones in DSP slices, but they're a completely different thing)08:40
pepijndevosRight, just a LUT with a hardware carry08:41
mwkwhat FPGAs really do is reuse as much of the LUT as possible, and have the bare minimum of special carry chain logic08:41
mwkand maybe an extra XOR gate (for xilinx)08:41
pepijndevoshmmm okay08:42
pepijndevosSo what I'm thinking about is... the Gowin ALU supports special comparison modes, but there does not seem to be a straightforward way to map an $alu cell to them as the comparison is "indirect"08:45
daveshahRealistically they'll be just setting the LUT init to some value08:45
pepijndevosRight, but if Yosys just sets them to add/sub and then wraps extra logic around them for comparison, it could be less than ideal I suppose.08:46
daveshahI would be careful creating too much hard logic for comparison as it might prevent optimisation down the line08:47
pepijndevosBut then, IIRC I did not actually manage to make the vendor tools issue some of these modes, so maybe they are actually the same LUT contents.08:47
daveshahOlder Lattice FPGAs had these modes too08:47
daveshahBut they definitely didn't have any special hardware behind them beyond the LUT and carry logic08:47
pepijndevosRight08:47
mwkheh08:47
mwk$lt/$gt could actually be implemented in real clever ways on xilinx08:48
mwkusing the carry chain08:48
mwktoo bad we don't do it08:48
daveshahYeah although the usual issues with current mapping and optimisation that mean that sort of stuff can deny optimisations elsewhere08:49
mwkfor comparison between two variables, you can do two bits per LUT; for comparison between a variable and an arbitrary constant, five bits per LUT08:49
mwkright :(08:49
lukegoAdded the yosys "show" GUI dependencies to the nixpkgs package if anybody is interested. was a little fiddly with icons. as a separate package to avoid pulling it all in with the compiler. https://github.com/NixOS/nixpkgs/pull/7385609:26
tpbTitle: yosys-withGui: New wrapper derivation with GUI runtime dependencies by lukego · Pull Request #73856 · NixOS/nixpkgs · GitHub (at github.com)09:26
lukegonewbie alert: How do I make a ROM at the RTLIL level? I feel like $memrd is the right primitive but I don't see how to supply an initial value.09:38
daveshahlukego: $meminit09:39
lukegohm the .tex docs on master seem to talk more about meminit than http://www.clifford.at/yosys/files/yosys_manual.pdf09:41
daveshahThey might have been improved since the 0.9 release, when that was probably generated09:42
lukegoIs there a good place to look for RTLIL examples to help interpret the docs? Or source is the best bet?09:43
daveshahWrite some Verilog and look at the RTLIL it produces09:44
lukegoI thought the whole point of RTLIL is that it lets me avoid learning Verilog ;-)09:44
daveshahRTLIL is not intended to be human writeable outside of testing09:45
lukego(I started with some Verilog code that did ROM-like things as a "switch" statement but that didn't get mapped onto memory, just logic)09:45
daveshahYou would want to use an initial block or $readmem[hb] to get $meminit cells09:45
lukegoI'm hoping that RTLIL can serve as a foundation like FIRRTL i.e. provide "simple as possible but no simpler" abstractions to build on top of. Like an instruction set so to speak. Long term goal is not to write code by hand.09:47
lukegoSo I'm looking at ilang as the "main thing" here and the Verilog frontend as some esoterica.09:47
lukegobut thanks I will chase these refs in the source :)09:47
daveshahThe main reason not to hand write rtlil is a lack of parameterisation or generate-for type structuess09:48
daveshahBecause it only exists after the first parts of elaboration are done09:49
*** d__ has quit IRC09:49
lukegoRight. Maybe it's easiest to say that my goal is to write a frontend here, albeit I don't know much about how that will look yet, and so I'm trying to understand the "mid-end" for now09:49
daveshahI see09:49
lukegoI'm thinking I'll start with RTLIL and add basic macro-like abstractions on top of that, and see how far that gets me as a learning exercise.09:50
lukegobut I take the point not to expect a large corpus of hand-written RTLIL example programs09:50
*** d__ has joined #yosys09:51
emilylukego: nmigen is an existing compiles-to-RTLIL eDSL09:54
emilyhttps://github.com/m-labs/nmigen09:54
tpbTitle: GitHub - m-labs/nmigen: A refreshed Python toolbox for building complex digital hardware (at github.com)09:54
emilyyou can probably learn whatever you want to know from its source code09:54
lukegooh neat thanks emily. I know of nmigen but I didn't know it targets RTLIL.09:55
emilyhttps://github.com/m-labs/nmigen/blob/master/nmigen/back/rtlil.py09:55
tpbTitle: nmigen/rtlil.py at master · m-labs/nmigen · GitHub (at github.com)09:55
*** kraiskil has quit IRC10:17
*** rohitksingh has quit IRC10:19
*** rohitksingh has joined #yosys10:21
*** rohitksingh_ has joined #yosys10:26
*** rohitksingh has quit IRC10:27
lukegoOh hey :-) Suppose that I wanted to get my hands on a high-end FPGA like a Kintex UltraScale+. How do hacker types usually do that? Seems like buying it off the shelf for list price is not the answer10:32
daveshahHave a look for used bitcoin miner boards?10:34
daveshahvcu152510:34
daveshahand derivatives10:34
daveshahOr just use AWS F1 if you don't want to do any custom IO interfacing10:34
lukegoI'm mostly intersted in I/O. My dream setup right now would be an AMD Threadripper with all of its PCIe lanes connected to FPGA(s)10:35
lukegoSo that's ~100 PCIe gen410:35
daveshahWell if you only care about PCIe then AWS F1 might work10:35
daveshahBut it's nowhere near that many lanes10:36
daveshahSomething like x4 or x8 iirc10:36
lukegooh hm that's a really good point thanks!10:36
daveshahYou can pay more for a box with something like 8 FPGA cards too10:36
lukegoI'm really interested in networking applications where the FPGA would be hooked up to a bunch of 100GbE ports, but I think it's reasonable to treat that as a separate problem i.e. do the PCIe I/O and the Ethernet I/O work separately.10:37
lukegoThat sounds extremely promising. I'd like to do things like benchmark the CPU from the PCIe side to see where the pain ports are on the uncore/cache/ram10:37
lukegoand an EC2 machine with 8xFPGA is probably a perfect solution for 0.1% the cost of buying the equipment for a 10 minute test.10:38
daveshahThe one with 8 cards is $13/hr10:38
daveshahEven if you used it 24/7 it would be good value compared to buying in low qty10:39
daveshahSuch is the discount that Amazon gets10:39
lukegoSo even cheaper because I don't think you get that hardware for $13K :)10:39
lukegoYeah I've been told privately that I would faint if I saw what Amazon pay for FPGAs.10:39
daveshahI would guess it is about 10-20x cheaper than DigiKey list10:40
lukegoSeems like an anti-hacker conspiracy when BigCo and universities all get 10x discount but oh well10:40
lukegoI suppose that if you found the right approach to Xilinx etc they would shower you with hardware e.g. if you are consulting on a project for one of their big customers/prospects.10:41
lukegobut it's exhausting to play that game :)10:41
daveshahEven iCE40s have at least a 5x discount in 1e6 quantity10:41
daveshahI've been there before and it's easier said than done10:41
lukegoQuantity sure makes a difference. I wanted to buy an ECP5 based NIC but it costs like $600 and that just seems like a lot compared with the FPGA, even though I know that's the reality of producing the boards for a really low quantity market.10:42
daveshahThe first quotation I saw as a small company in the UK a few years ago for 1k Zynqs was above DigiKey single pricing10:42
daveshahFor the smaller parts it seems like 100k is where it gets good10:43
lukegothanks muchly for the tips10:43
lukegolunch time here10:43
daveshahEnjoy10:45
lukegoSo follow-up question if I may :-) Supposing I would ultimately aim for EC2 as a high-end FPGA env. How might I prototype in a meaningful way e.g. with affordable hardware that I can use as a proxy for the real thing? And could I use Yosys/nextpnr for F1 or is that a different universe?11:13
mwkyosys, yes; nextpnr, no11:18
sorearyou'll likely have a rather small number of pins you can connect to pcie lanes per fpga11:19
mwkthat too11:20
mwkunless you get one of the outrageously expensive ones11:20
sorearI haven't run the numbers for ultrascale+ but on ecp5-5G the aggregate throughput of the "slow" I/Os is about 6x the aggregate throughput of the fast IOs/SERDESes because there are so many more of the former11:20
daveshahwell nextpnr and rapidwright is a possibility for ec211:31
daveshahGiven it needs a dcp rather than a bitstream anyway11:31
daveshahBut the router can't handle such big designs on a Xilinx arch at the moment11:31
daveshahSome hacking of pblocks would also be needed to integrate with the fixed ec2 logic11:32
lukegoSeems like F1 instance FPGAs have PCIe gen3 x16 each. That would be 1Tbps of total PCIe bandwidth on a server with 8x FPGAs. Guessing that's NUMA so 0.5Tbps per node. Respectable. Even better if PCIe gen4 comes and doubles that.11:32
lukegoHopefully you get full control of the pins attached to PCIe? e.g. to run some small custom PCIe DMA load generator.11:33
daveshahI'm not sure, I know there is some logic that they provide11:34
sorearno11:36
lukegoI suppose that being familiar with Amazon F1 doesn't hurt in itself because whoever uses that in production probably has deep pockets.11:36
sorearthere's a "shell" which includes all of the pcie bits, you get an AXI4 interface11:36
lukegogack11:36
lukegowell, maybe fine, do you think it limits performance?11:36
sorearI haven't actually tried but I doubt it's a bottleneck for streaming workloads11:37
lukegoSeems to be the spec https://github.com/aws/aws-fpga/blob/master/hdk/docs/AWS_Shell_Interface_Specification.md11:37
tpbTitle: aws-fpga/AWS_Shell_Interface_Specification.md at master · aws/aws-fpga · GitHub (at github.com)11:37
lukegoI'm in the packet networking domain so I'd be using the FPGAs as a proxy for NICs. Like, how much network traffic can an application keep up with, including all the messy bottlenecks like PCIe/RAM/NUMA. But also interested in seeing what can be done directly on the FPGAs.11:38
sorear"full control of the pins attached to pcie" sounds like it would have fun security implications for f1.2xlarge11:39
daveshahYeah11:40
lukegoMaybe it's a blessing to have a simplified interface really. Just provided it's not de-optimized for the specific intended purpose. always a risk when benchmarking and looking for fundamental limits.11:43
lukegoI suppose that for dev purposes you would want a board with an equivalent PCIe-AXI4 bridge? Is that something straightforward?11:46
daveshahThe Xilinx Alveo U200/U250 are probably the closest to that11:47
lukegonot so easy on ECP5?11:47
daveshahafaik noone has done full open PCIe on ECP511:47
lukegoMaybe it's worth being in the Xilinx universe when developing for F1 anyway11:47
daveshahI suspect you will hit timing and utilisation limits on ECP5 quickly11:48
daveshahfor what you are doing11:48
lukegoSeems like ~$5K for those cards. Alternative might be to just develop against a dummy host / AXI4 stub?11:49
lukegoBut... quite a lot of ground to cover before this becomes relevant. I have soldered a CAT5 cable onto my TinyFPGA BX and I don't plan to buy any more hardware until I've successfully spammed out 10BaseT ethernet packets from that. Just stubbornly want to write the packet generator in RTLIL :)11:50
daveshahSounds like fun11:51
lukego(And I must remember that my most ambitious FPGA designs to date have been variations on "blink this LED" and I'll now come facee to face with my own incompetence :-))11:51
lukegoThere's a neat dirty ethernet hack that I found here https://www.fpga4fun.com/10BASE-T0.html but I didn't like that their hard-coded packet seems to be synthesied as logic instead of ROM :)11:52
tpbTitle: fpga4fun.com - 10BASE-T FPGA interface 0 - A recipe to send Ethernet traffic (at www.fpga4fun.com)11:52
ZipCPU10BaseT?  Why not 100BaseT?  Or GbE?11:54
lukegoiCE40 with no SerDes11:54
lukegoThis is a gating example. I'll allow myself to buy an ECP5-Versa board with 1G after I make this work :)11:54
*** kraiskil has joined #yosys11:55
ZipCPU100BaseT should still work ... RMII uses four wires ganged together at 25 Mhz.  Shouldn't be much of a problem, and might be easier to find the hardware for.11:55
lukegoI also think it's a bit cheeky to be doing ethernet using generic I/O pins and maybe it's easier to get away with that at lower speeds? There's no PHY so no RMII just wires soldered between the cable and the FPGA11:56
lukego(maybe I should have emphasised "dirty" over "neat" in my description ;-))11:57
* ZipCPU shudders at the thought of not using a PHY11:57
lukegowell it only needs to send one packet and I'll be happy. I'll put a UDP envelope that can get it routed to the other side of the world and that will make it seem like an accomplishment :)11:58
ZipCPUOnly if you can verify that the task was accomplished11:58
ZipCPUICMP is often easier for that purpose11:58
ZipCPUDon't forget, though, that you can't do that without also supporting ARP .... :O11:58
lukegoI'm prepared to be dirty on this level :-) I'll skip ARP by hardcoding the DMAC and I'll receive the packet using tcpdump so it doesn't matter if the receiver lkes it or not :)11:59
lukegoI have almost no experience with FPGAs but I have spent most of my adult life doing inadvisable things with IP networks :)12:00
lukegobtw love your blog! once I have my feet wet I am planning to go carefully through your intro to formal posts12:01
ZipCPUAwesome!12:01
lukegoactually thanks for the RMII suggestion btw! maybe the next baby-step project for me would be to buy a 100M PHY and hook that up to my breadboard. That could be an intermediate step between this 10M hack and the ECP5 1G board.12:03
lukegoMaybe I could even scavenge a PHY from a derelict circuit board lying around here somewhere..12:04
lukegoI have splurged on a new soldering iron that will arrive next week so I'm itching for such a project. I decided that I've gotten my money out of the ~$5 aliexpress one now.12:04
*** fsasm has joined #yosys12:36
*** kraiskil has quit IRC13:51
*** rohitksingh_ has quit IRC14:00
*** fsasm has quit IRC14:04
*** citypw has quit IRC15:31
*** dh73 has joined #yosys15:51
*** dys has joined #yosys15:54
*** bwidawsk has quit IRC16:00
*** dys has quit IRC16:00
*** bwidawsk has joined #yosys16:05
*** rohitksingh has joined #yosys16:06
*** kraiskil has joined #yosys16:13
*** rohitksingh has quit IRC16:14
*** emeb has joined #yosys16:17
*** d__ has quit IRC17:16
*** bobzoidting has joined #yosys17:26
*** ravenexp has quit IRC17:30
*** dh73 has quit IRC18:14
*** dys has joined #yosys18:25
*** dys has quit IRC18:53
*** dys has joined #yosys19:17
*** dh73 has joined #yosys19:18
emilylukego: btw, I used to read your blog a bunch many years ago, so belated thanks from a fan of obscure languages and environments :)20:06
*** Jybz has quit IRC20:28
*** nrossi has quit IRC20:44
*** X-Scale` has joined #yosys21:08
*** pie_ has joined #yosys21:09
*** Ekho has quit IRC21:09
*** indy_ has joined #yosys21:10
*** indy has quit IRC21:12
*** pie__ has quit IRC21:12
*** X-Scale has quit IRC21:12
*** ZipCPU has quit IRC21:12
*** turq has quit IRC21:12
*** fengling has quit IRC21:12
*** X-Scale` is now known as X-Scale21:12
*** ZipCPU has joined #yosys21:14
*** fengling has joined #yosys21:15
*** Ekho has joined #yosys21:18
*** bobzoidting has quit IRC21:38
*** svenn4 has joined #yosys22:12
*** rohitksingh has joined #yosys22:23
*** svenn4 has quit IRC22:30
*** svenn4 has joined #yosys22:30
whitequark09:40 < lukego> hm the .tex docs on master seem to talk more about meminit than http://www.clifford.at/yosys/files/yosys_manual.pdf22:31
whitequarkI got frustrated with the sparse docs and updated them :)22:31
*** ktemkin has quit IRC23:08
*** snajpa has left #yosys23:32

Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!