*** tpb has joined #opencores | 00:00 | |
*** O01eg has quit IRC | 05:01 | |
*** book` has quit IRC | 10:22 | |
*** book` has joined #opencores | 10:25 | |
*** book` has joined #opencores | 10:28 | |
*** mardinator_ has joined #opencores | 20:26 | |
mardinator_ | hi, has anyone used verilator before? | 20:27 |
---|---|---|
ZipCPU | mardinator_: Yes. I use it regularly | 20:32 |
mardinator_ | well, i did get things generated for c++ | 20:33 |
mardinator_ | but it complains about not having a main function | 20:33 |
ZipCPU | That part you have to build | 20:34 |
ZipCPU | Check out lesson 1 of https://zipcpu.com/tutorial | 20:34 |
tpb | Title: Verilog Beginner's Tutorial (at zipcpu.com) | 20:34 |
mardinator_ | but I was able to get all the info I needed from the c++ files too, just figured | 20:34 |
ZipCPU | It goes through generating a "main" function | 20:34 |
mardinator_ | ok, thanks | 20:34 |
mardinator_ | i just think that people have less trouble to read VCD files | 20:35 |
ZipCPU | There's also a decent article for more advanced designs with Verilator at zipcpu.com/ with the title, "Taking a new look at verilator" | 20:35 |
mardinator_ | with say gtkwave | 20:35 |
ZipCPU | You can get VCD files from Verilator--I do it all the time | 20:35 |
mardinator_ | ZipCPU: yep, but i think not before i will be able to compile the c++ files, i.e when i get the main function right | 20:36 |
mardinator_ | ZipCPU: btw. have you used icarus verilog? | 20:36 |
ZipCPU | Yeah ... you not only need the main function, but the main function has to call some trace generating functions as well | 20:36 |
mardinator_ | i did that allready | 20:37 |
mardinator_ | #include "Vvgpr_tb.h"#include "verilated.h"#include "verilated_vcd_c.h"int main(int argc, char **argv, char **env) { int i; int clk; Verilated::commandArgs(argc, argv); // init top verilog instance main* top = new main; | 20:37 |
ZipCPU | Did you call verilator with the --trace option? | 20:37 |
mardinator_ | and ran the stuff, as Verilator -exe main.cpp ... | 20:38 |
mardinator_ | yeah...listen i have not yet looked your tutorial, maybe i did something wrong | 20:38 |
ZipCPU | Did you do the: VerilatedVcdC *tfp = new VerilatedVcdC; tb->trace(tfp, 99); tfp->open("tracename.vcd"); ... ? | 20:38 |
mardinator_ | i used another one before | 20:38 |
ZipCPU | Trace generation is covered in lesson two ... ;) | 20:38 |
mardinator_ | yep | 20:38 |
ZipCPU | In between eval's, you then need to call tfp->dump(clock_number*clock_period); | 20:39 |
mardinator_ | but it looks like , icarus verilog did not generate the bits that i was interested in | 20:39 |
ZipCPU | There's some sticky details about how to do it too | 20:39 |
mardinator_ | so verilator does something more realistically | 20:39 |
ZipCPU | It can be very realistic if you would like | 20:39 |
mardinator_ | maybe it is cause of some coverage issue, but the flops i was interested in miaow, did not get dumped by icarus verilog at all it seems, but only with verilator | 20:40 |
ZipCPU | Curious. Don't know what to make of it | 20:41 |
mardinator_ | those are issue queues of miaow, 1024 instances of 214 bits of decode_wr_data | 20:41 |
mardinator_ | per compute unit, verilator shows them, but iverilog does not in its trace | 20:41 |
mardinator_ | or generated a.out | 20:41 |
mardinator_ | those instances show up, as hw instances though marked as unused | 20:45 |
mardinator_ | maybe icarus verilog only generates the used ones if wired across modules | 20:46 |
ZipCPU | Some synthesizers support a (* keep *) attribute that can be used to make certain particular values of interest remain within the design | 20:46 |
mardinator_ | anyhow, i am slow today, i try to see if the VCD file matches my theory, so reading the tutorial | 20:50 |
mardinator_ | grep -in ".*vtemp3472.*" Vvgpr_tb__Trace.cpp -R | wc -l | 21:11 |
mardinator_ | __Vtemp3472[6U] = ((0xffc00000U & (vlTOPp->vgpr_tb__DOT__issue_test__DOT__issue__DOT__scoreboard__DOT__decode_wr_data[0U] << 0x16U)) | 21:12 |
mardinator_ | such lines, round about 256 of them and four multiplexers | 21:13 |
mardinator_ | and a selector based of SIMD it seems | 21:13 |
mardinator_ | they are all missing from icarus verilog dump | 21:13 |
mardinator_ | ZipCPU: ouh i finally managed to parse your tutorial, i mean understand it... | 21:19 |
mardinator_ | it appears that i have to link the static library from the generated functions together with hand generated cpp | 21:20 |
mardinator_ | to make a final executable, this is the step i did not do before, yeah makes sense | 21:20 |
mardinator_ | it complained that my class had no clk and rst members, i looked those were regs in the testbench | 21:36 |
mardinator_ | trying to change them to wires | 21:36 |
mardinator_ | got the dump of VCD, but i can not see those lines included in the VCD | 22:04 |
mardinator_ | not sure how it boils up, if my theories were right, need to read about what that VVgpr_tb.cpp vs. Vvgpr_tb__Trace.cpp is about... | 22:17 |
mardinator_ | ZipCPU: any idea if the trace file could be the full hardware, vs the other one just for active? | 22:18 |
ZipCPU | mardinator_: I'm back, now reading backlog | 22:42 |
mardinator_ | i did get the VCD dump, but trace files show lot more hw resources than that of the module file | 22:43 |
ZipCPU | I'm not sure I follow ... what would be the difference between the "full hardware" and the "just for active"? | 22:43 |
mardinator_ | me either :D | 22:49 |
mardinator_ | but i hypothesized, that maybe some parallel in parallel out methods do not show inactive instances | 22:50 |
mardinator_ | for an example | 22:50 |
mardinator_ | if the trace detects that register file instance 1011 is active in decimal reg 1011 from 1024 | 22:51 |
mardinator_ | it generates the trace instances only for that one | 22:51 |
mardinator_ | i.e 1011 | 22:51 |
ZipCPU | Hmm ... | 22:53 |
ZipCPU | let me try this ... is the register you are hoping to find within the trace defined within a generate block? | 22:53 |
mardinator_ | hmm, actually it is not | 22:54 |
mardinator_ | it is from arrays of instances from the regfile | 22:55 |
mardinator_ | ouh...but... | 22:55 |
mardinator_ | i thought about this myself before also, actually yeah it could be | 22:56 |
ZipCPU | To view a register that's defined within a generate block, you need to make certain that the generate block has a name | 22:56 |
mardinator_ | cause yeah to the decode_wr_data alu module feedsback done and wfid which are selectors | 22:57 |
mardinator_ | they originate from generated module | 22:57 |
mardinator_ | ZipCPU: i do not think their generate block has names though | 22:58 |
ZipCPU | That's easy enough to fix | 22:59 |
mardinator_ | https://github.com/VerticalResearchGroup/miaow/blob/master/src/verilog/rtl/alu/valu.v | 22:59 |
tpb | Title: miaow/valu.v at master · VerticalResearchGroup/miaow · GitHub (at github.com) | 22:59 |
mardinator_ | evaluate yourself, the generate block starts from there | 22:59 |
ZipCPU | So following line 41, add a: begin : GEN_SIMD // and then before 58, add an else, then after 58 add a: begin GEN:SIMF, etc. | 23:01 |
mardinator_ | hmm, ok, but this can not be the cause, cause my testbenches do not trigger the alu module in, as i inspected | 23:02 |
mardinator_ | and the register file does not seem to use generate blocks, it uses xilinx block_ram, which i downloaded myself | 23:08 |
mardinator_ | cause the miaow1.0 ones did not compile, and i did not never ever understood why | 23:08 |
ZipCPU | Ok, time to look at the .h file | 23:10 |
ZipCPU | Are the variables of interest in the .h file? | 23:10 |
ZipCPU | Also, Verilator uses the (* keep *) attribute. You might want to add it in | 23:10 |
ZipCPU | Further, if you are struggling to know what's going on, feel free to play with the .cpp file--adding in whatever debugging statements you need. | 23:11 |
mardinator_ | it is not in particular the variables that worry me, variables are all spotted in VCD | 23:15 |
mardinator_ | each and every one of them are grabbed and pulled in | 23:15 |
mardinator_ | what looks to be the problem, is that some of the instances are missing for the same variable names | 23:15 |
ZipCPU | Check the .h file to see 1) if they are still in the Verilated code, and 2) what their names are | 23:17 |
*** mardinator_ has quit IRC | 23:18 | |
*** mardinator_ has joined #opencores | 23:19 | |
mardinator_ | power coord came out, and laptop turned off.. i.e battery is dead for my hp | 23:20 |
ZipCPU | Cap'n! The dilithium crystals are ... | 23:21 |
mardinator_ | i read somewhere, that chinease sell high amounts of vannadium which has smaller density for energy storage | 23:24 |
mardinator_ | anyways, but...it almost appears for some reason the decode_wr_data.. gets the instances from wavefronts only 40maximum of them in the module file | 23:25 |
mardinator_ | however in the trace, they seem to crap in those instances based of the regfiles | 23:25 |
ZipCPU | Are you working with your own design, or someone else's? | 23:29 |
mardinator_ | yeah someones elses, and it is very very complex design | 23:29 |
ZipCPU | What's it supposed to do? | 23:29 |
mardinator_ | which is the problem for me, cause it is sometimes hard to grasp when i have important dilemmas | 23:30 |
mardinator_ | it is a full compute overlay circuit based of GCN gpu | 23:30 |
mardinator_ | for fpgas | 23:30 |
mardinator_ | instruction set is GCN that of amd radeon | 23:30 |
ZipCPU | GCN? | 23:30 |
mardinator_ | basically it is chinease clone | 23:30 |
ZipCPU | Ooohh ... wow, okay | 23:30 |
mardinator_ | graphics core next | 23:30 |
ZipCPU | That is a big design | 23:31 |
ZipCPU | And so you have a lot of identical compute nodes, because it's a graphics core? | 23:31 |
mardinator_ | actually it implements in the latest stack only one compute node | 23:33 |
mardinator_ | and each compute node has 4SIMDs though | 23:33 |
mardinator_ | so one compute unit each having four SIMDs | 23:34 |
ZipCPU | So ... how many compute nodes? | 23:34 |
mardinator_ | only one compute unit, that can run 4ALUs | 23:34 |
mardinator_ | and inflight warp/wavefront count is max 40 | 23:35 |
mardinator_ | the problem is, that i can not be entirely sure, for instruction replays | 23:35 |
mardinator_ | that are they done from based of flops that originate from regfile hierarchical instances, or wavefront ones | 23:35 |
mardinator_ | there are 256 simd registers per ALU units or SIMDs so to speak | 23:36 |
mardinator_ | so whether the instances per SIMD are 40 issue flops or 256 is the dilemma | 23:36 |
mardinator_ | amd docs of dx12 kinda are also very raw and not much saying, but people have digged out | 23:37 |
mardinator_ | that it is more closer according to their documents to 256 per SIMD and 1024 per CU those decode_wr_data flops | 23:37 |
mardinator_ | heheee, very weird | 23:38 |
mardinator_ | and i am in the middle of having in one file 40of them | 23:38 |
mardinator_ | and in another 268 | 23:38 |
mardinator_ | which i am still processing i shrinked it by rounding to 256, but there was a theory which i allready forgat | 23:39 |
mardinator_ | ZipCPU: the basic idea of such issue queues is shortening the pipeline with async queue toggles | 23:44 |
mardinator_ | where LSU would schedule the instructions extremely fast without entering to couple or more first pipeline stages | 23:45 |
mardinator_ | so to speak it is very very important piece of the puzzle that i am trying to be sure about | 23:46 |
ZipCPU | Got to run, look forward to hearing more about your success later! | 23:46 |
mardinator_ | yeah, i am reading for couple of hours more, and trying to debug the stuff | 23:47 |
mardinator_ | thanks btw. for your help | 23:47 |
mardinator_ | but miaow has also a dispatcher for 8compute units, that could be little bitrotten and complex to get running on the fpga | 23:48 |
mardinator_ | i mean it is even entirely magically complex , it's insanely complex | 23:48 |
Generated by irclog2html.py 2.13.1 by Marius Gedminas - find it at mg.pov.lt!