*** tpb has joined #yosys | 00:00 | |
*** lutsabound has quit IRC | 00:06 | |
*** emeb has quit IRC | 01:12 | |
*** seldridge has quit IRC | 02:03 | |
*** seldridge has joined #yosys | 02:51 | |
*** leviathan has joined #yosys | 03:02 | |
*** kuldeep has quit IRC | 03:17 | |
*** kuldeep has joined #yosys | 03:24 | |
*** rohitksingh_work has joined #yosys | 03:55 | |
*** seldridge has quit IRC | 05:33 | |
*** emeb_mac has quit IRC | 06:58 | |
*** dys has joined #yosys | 07:24 | |
*** dys has quit IRC | 07:59 | |
*** mjoldfield has quit IRC | 08:18 | |
*** leviathan has quit IRC | 08:34 | |
*** mjoldfield has joined #yosys | 08:38 | |
*** m4ssi has joined #yosys | 08:41 | |
*** mjoldfield has quit IRC | 08:54 | |
*** mjoldfield has joined #yosys | 08:55 | |
*** mjoldfield has quit IRC | 09:17 | |
*** mjoldfield has joined #yosys | 09:18 | |
*** jfng has quit IRC | 09:19 | |
*** jayaura has quit IRC | 09:19 | |
*** Rixon[m] has quit IRC | 09:19 | |
*** nrossi has quit IRC | 09:19 | |
*** danieljabailey has quit IRC | 09:22 | |
*** danieljabailey has joined #yosys | 09:24 | |
*** nrossi has joined #yosys | 09:59 | |
*** jfng has joined #yosys | 09:59 | |
*** leviathan has joined #yosys | 10:57 | |
*** maikmerten has joined #yosys | 11:00 | |
*** rohitksingh_work has quit IRC | 12:56 | |
*** lutsabound has joined #yosys | 13:18 | |
*** rohitksingh has quit IRC | 13:27 | |
*** rohitksingh has joined #yosys | 14:16 | |
maikmerten | can BRAM-inference for iCE40 also work when reading with a blocking assignment? I'm trying to define a cache that determines a cache hit/miss within one clock cycle, so I need the tag information available in that clock tick. I guess that's not something BRAM can provide? | 14:32 |
---|---|---|
maikmerten | https://paste.debian.net/1049849/ | 14:32 |
tpb | Title: debian Pastezone (at paste.debian.net) | 14:32 |
daveshah | maikmerten: no, that won't work on ice40 | 14:44 |
daveshah | you'd have to find an fpga with distributed ram | 14:44 |
maikmerten | thanks :-) | 14:44 |
maikmerten | I guess having a cache-lookup cycle it'll be then ;-) | 14:45 |
sorear | also, distributed ram tends to be much lower capacity than block ram on chips with both | 14:48 |
*** AlexDaniel has quit IRC | 14:55 | |
*** seldridge has joined #yosys | 14:58 | |
daveshah | sorear: not always a downside in some cases like register files, where a whole bram would be a waste anyway | 14:59 |
sorear | indeed | 15:00 |
sorear | but a cache memory is more likely to be sized to use the ~entire chip | 15:00 |
sorear | especially on non-UP ice40 where you only have 16KB total | 15:00 |
*** leviathan has quit IRC | 15:13 | |
*** leviathan has joined #yosys | 15:18 | |
maikmerten | for getting my feet wet with caches, I'm going for a cache with 256 entries, each 32 bit wide (which fits nicely into 2 iCE40 BRAMs) and 256 16 bit tags (1 bit valid, 15 bit address) | 15:19 |
maikmerten | and then work my way up ;-) | 15:19 |
maikmerten | will only accept aligned 32-bit words... essentially geared towards being an instruction cache | 15:20 |
sorear | direct-mapped cache covering 32 MB of address space? | 15:21 |
maikmerten | yeah, it's going to be a horrible cache-trash fest | 15:23 |
maikmerten | direct-mapped, write-through, massive 256 words in capacity... what's not to like? ;-) | 15:24 |
sorear | what sort of external memory | 15:25 |
maikmerten | 8-bit wide external SRAM | 15:25 |
*** rohitksingh has quit IRC | 15:26 | |
maikmerten | so currently my CPU needs in total about 6 cycles to fetch the next instruction | 15:26 |
maikmerten | this is basically the RISC-V equivalent to an Intel 8088 ;-) | 15:27 |
maikmerten | (but that one at least had a prefetch queue) | 15:27 |
*** lutsabound has quit IRC | 15:28 | |
sorear | what can the external sram do latency/throughput? | 15:28 |
maikmerten | I'm currently using the SRAM with one cycle latency (present address at one clock edge, get the data one cycle later) | 15:31 |
maikmerten | the chip itself can do 10ns cycle times | 15:32 |
maikmerten | but due to the board layout and connectors and because I'm sampling data mid-cycle, I can only drive it at ~30 MHz | 15:32 |
sorear | async SRAM? | 15:33 |
maikmerten | for now I've settled for 25.125 MHz, which happens to be very close to 640x480@60Hz VGA timings | 15:33 |
maikmerten | yes | 15:33 |
maikmerten | 512Kx8 | 15:33 |
sorear | i guess you could do 4-bit color and dedicate 50% of the memory cycles to scan-out | 15:34 |
maikmerten | on this extension board for the iCE40 HX8K eval board: https://github.com/maikmerten/hx8k-breakout-extension | 15:34 |
tpb | Title: GitHub - maikmerten/hx8k-breakout-extension: A PCB with SRAM, buttons, LEDs and some pmod-compatible connectors for the Lattice HX8K Breakout Board (at github.com) | 15:35 |
maikmerten | yes, with some cleverness I guess one could drive VGA from that SRAM as well. | 15:36 |
sorear | oh I though you were saying you were already going to do VGA | 15:39 |
sorear | although real-time chargen/sprites is also an option | 15:39 |
maikmerten | well, in the future I might want to do VGA. Back when I did something similiar in VHDL (pre-yosys), I already had a simple RISC-V SoC with VGA | 15:59 |
maikmerten | that one generated 40x25 characters | 15:59 |
maikmerten | with a chargen for 256 8x8 pixel chars | 16:00 |
maikmerten | which is rather compact and can be done in BRAM | 16:00 |
*** seldridge has quit IRC | 16:09 | |
*** AlexDaniel has joined #yosys | 16:24 | |
*** rohitksingh has joined #yosys | 16:29 | |
*** leviathan has quit IRC | 16:56 | |
*** seldridge has joined #yosys | 17:16 | |
*** m4ssi has quit IRC | 17:17 | |
*** seldridge has quit IRC | 17:31 | |
*** seldridge has joined #yosys | 17:55 | |
*** rohitksingh has quit IRC | 18:02 | |
*** ZipCPU has quit IRC | 18:06 | |
*** dys has joined #yosys | 18:33 | |
maikmerten | okay, a first implementation of my "aligned word only", "only word reads get into cache", direct-mapped, 256 entry cache works now | 19:01 |
maikmerten | dhrystones go from 4105 per second to 4655 per second, a 13.3% performance increase | 19:02 |
maikmerten | ressource util goes from 1857 LCs (no cache) to 1938 LCs (with cache), BRAMs from 5 to 8 (of 32) | 19:06 |
maikmerten | (so a 4.3% increase of LC usage for 13.3% better performance... that's ok I guess) | 19:07 |
sorear | does it cache instructions, data, or both | 19:08 |
maikmerten | every aligned word read gets offered to the cache | 19:09 |
maikmerten | so no proper separation | 19:09 |
maikmerten | also, every write invalidates the respective cache line | 19:10 |
maikmerten | so the cache is a) small and b) gets invalidated a lot | 19:10 |
*** maikmerten has quit IRC | 19:23 | |
*** seldridge has quit IRC | 19:35 | |
*** seldridge has joined #yosys | 19:56 | |
*** develonepi3 has quit IRC | 20:57 | |
*** m4ssi has joined #yosys | 21:48 | |
*** m4ssi has quit IRC | 22:18 | |
*** pie__ has quit IRC | 22:35 | |
*** seldridge has quit IRC | 23:13 |
Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!