Friday, 2019-02-22

*** tpb has joined #opencores00:00
*** O01eg has quit IRC05:01
*** book` has quit IRC10:22
*** book` has joined #opencores10:25
*** book` has joined #opencores10:28
*** mardinator_ has joined #opencores20:26
mardinator_hi, has anyone used verilator before?20:27
ZipCPUmardinator_: Yes.  I use it regularly20:32
mardinator_well, i did get things generated for c++20:33
mardinator_but it complains about not having a main function20:33
ZipCPUThat part you have to build20:34
ZipCPUCheck out lesson 1 of
tpbTitle: Verilog Beginner's Tutorial (at
mardinator_but I was able to get all the info I needed from the c++ files too, just figured20:34
ZipCPUIt goes through generating a "main" function20:34
mardinator_ok, thanks20:34
mardinator_i just think that people have less trouble to read VCD files20:35
ZipCPUThere's also a decent article for more advanced designs with Verilator at with the title, "Taking a new look at verilator"20:35
mardinator_with say gtkwave20:35
ZipCPUYou can get VCD files from Verilator--I do it all the time20:35
mardinator_ZipCPU: yep, but i think not before i will be able to compile the c++ files, i.e when i get the main function right20:36
mardinator_ZipCPU: btw. have you used icarus verilog?20:36
ZipCPUYeah ... you not only need the main function, but the main function has to call some trace generating functions as well20:36
mardinator_i did that allready20:37
mardinator_#include "Vvgpr_tb.h"#include "verilated.h"#include "verilated_vcd_c.h"int main(int argc, char **argv, char **env) {  int i;  int clk;  Verilated::commandArgs(argc, argv);  // init top verilog instance  main* top = new main;20:37
ZipCPUDid you call verilator with the --trace option?20:37
mardinator_and ran the stuff, as Verilator -exe main.cpp ...20:38
mardinator_yeah...listen i have not yet looked your tutorial, maybe i did something wrong20:38
ZipCPUDid you do the: VerilatedVcdC *tfp = new VerilatedVcdC; tb->trace(tfp, 99); tfp->open("tracename.vcd"); ... ?20:38
mardinator_i used another one before20:38
ZipCPUTrace generation is covered in lesson two ... ;)20:38
ZipCPUIn between eval's, you then need to call tfp->dump(clock_number*clock_period);20:39
mardinator_but it looks like , icarus verilog did not generate the bits that i was interested in20:39
ZipCPUThere's some sticky details about how to do it too20:39
mardinator_so verilator does something more realistically20:39
ZipCPUIt can be very realistic if you would like20:39
mardinator_maybe it is cause of some coverage issue, but the flops i was interested in miaow, did not get dumped by icarus verilog at all it seems, but only with verilator20:40
ZipCPUCurious.  Don't know what to make of it20:41
mardinator_those are issue queues of miaow, 1024 instances of 214 bits of decode_wr_data20:41
mardinator_per compute unit, verilator shows them, but iverilog does not in its trace20:41
mardinator_or generated a.out20:41
mardinator_those instances show up, as hw instances though marked as unused20:45
mardinator_maybe icarus verilog only generates the used ones if wired across modules20:46
ZipCPUSome synthesizers support a (* keep *) attribute that can be used to make certain particular values of interest remain within the design20:46
mardinator_anyhow, i am slow today, i try to see if the VCD file matches my theory, so reading the tutorial20:50
mardinator_grep -in ".*vtemp3472.*" Vvgpr_tb__Trace.cpp -R | wc -l21:11
mardinator___Vtemp3472[6U] = ((0xffc00000U & (vlTOPp->vgpr_tb__DOT__issue_test__DOT__issue__DOT__scoreboard__DOT__decode_wr_data[0U]                                            << 0x16U))21:12
mardinator_such lines, round about 256 of them and four multiplexers21:13
mardinator_and a selector based of SIMD it seems21:13
mardinator_they are all missing from icarus verilog dump21:13
mardinator_ZipCPU: ouh i finally managed to parse your tutorial, i mean understand it...21:19
mardinator_it appears that i have to link the static library from the generated functions together with hand generated cpp21:20
mardinator_to make a final executable, this is the step i did not do before, yeah makes sense21:20
mardinator_it complained that my class had no clk and rst members, i looked those were regs in the testbench21:36
mardinator_trying to change them to wires21:36
mardinator_got the dump of VCD, but i can not see those lines included in the VCD22:04
mardinator_not sure how it boils up, if my theories were right, need to read about what that VVgpr_tb.cpp vs. Vvgpr_tb__Trace.cpp is about...22:17
mardinator_ZipCPU: any idea if the trace file could be the full hardware, vs the other one just for active?22:18
ZipCPUmardinator_: I'm back, now reading backlog22:42
mardinator_i did get the VCD dump, but trace files show lot more hw resources than that of the module file22:43
ZipCPUI'm not sure I follow ... what would be the difference between the "full hardware" and the "just for active"?22:43
mardinator_me either :D22:49
mardinator_but i hypothesized, that maybe some parallel in parallel out methods do not show inactive instances22:50
mardinator_for an example22:50
mardinator_if the trace detects that register file instance 1011 is active in decimal reg 1011 from 102422:51
mardinator_it generates the trace instances only for that one22:51
mardinator_i.e 101122:51
ZipCPUHmm ...22:53
ZipCPUlet me try this ... is the register you are hoping to find within the trace defined within a generate block?22:53
mardinator_hmm, actually it is not22:54
mardinator_it is from arrays of instances from the regfile22:55
mardinator_i thought about this myself before also, actually yeah it could be22:56
ZipCPUTo view a register that's defined within a generate block, you need to make certain that the generate block has a name22:56
mardinator_cause yeah to the decode_wr_data alu module feedsback done and wfid which are selectors22:57
mardinator_they originate from generated module22:57
mardinator_ZipCPU: i do not think their generate block has names though22:58
ZipCPUThat's easy enough to fix22:59
tpbTitle: miaow/valu.v at master · VerticalResearchGroup/miaow · GitHub (at
mardinator_evaluate yourself, the generate block starts from there22:59
ZipCPUSo following line 41, add a: begin : GEN_SIMD // and then before 58, add an else, then after 58 add a: begin GEN:SIMF, etc.23:01
mardinator_hmm, ok, but this can not be the cause, cause my testbenches do not trigger the alu module in, as i inspected23:02
mardinator_and the register file does not seem to use generate blocks, it uses xilinx block_ram, which i downloaded myself23:08
mardinator_cause the miaow1.0 ones did not compile, and i did not never ever understood why23:08
ZipCPUOk, time to look at the .h file23:10
ZipCPUAre the variables of interest in the .h file?23:10
ZipCPUAlso, Verilator uses the (* keep *) attribute.  You might want to add it in23:10
ZipCPUFurther, if you are struggling to know what's going on, feel free to play with the .cpp file--adding in whatever debugging statements you need.23:11
mardinator_it is not in particular the variables that worry me, variables are all spotted in VCD23:15
mardinator_each and every one of them are grabbed and pulled in23:15
mardinator_what looks to be the problem, is that some of the instances are missing for the same variable names23:15
ZipCPUCheck the .h file to see 1) if they are still in the Verilated code, and 2) what their names are23:17
*** mardinator_ has quit IRC23:18
*** mardinator_ has joined #opencores23:19
mardinator_power coord came out, and laptop turned off.. i.e battery is dead for my hp23:20
ZipCPUCap'n!  The dilithium crystals are ...23:21
mardinator_i read somewhere, that chinease sell high amounts of vannadium which has smaller density for energy storage23:24
mardinator_anyways, almost appears for some reason the decode_wr_data.. gets the instances from wavefronts only 40maximum of them in the module file23:25
mardinator_however in the trace, they seem to crap in those instances based of the regfiles23:25
ZipCPUAre you working with your own design, or someone else's?23:29
mardinator_yeah someones elses, and it is very very complex design23:29
ZipCPUWhat's it supposed to do?23:29
mardinator_which is the problem for me, cause it is sometimes hard to grasp when i have important dilemmas23:30
mardinator_it is a full compute overlay circuit based of GCN gpu23:30
mardinator_for fpgas23:30
mardinator_instruction set is GCN that of amd radeon23:30
mardinator_basically it is chinease clone23:30
ZipCPUOoohh ... wow, okay23:30
mardinator_graphics core next23:30
ZipCPUThat is a big design23:31
ZipCPUAnd so you have a lot of identical compute nodes, because it's a graphics core?23:31
mardinator_actually it implements in the latest stack only one compute node23:33
mardinator_and each compute node has 4SIMDs though23:33
mardinator_so one compute unit each having four SIMDs23:34
ZipCPUSo ... how many compute nodes?23:34
mardinator_only one compute unit, that can run 4ALUs23:34
mardinator_and inflight warp/wavefront count is max 4023:35
mardinator_the problem is, that i can not be entirely sure, for instruction replays23:35
mardinator_that are they done from based of flops that originate from regfile hierarchical instances, or wavefront ones23:35
mardinator_there are 256 simd registers per ALU units or SIMDs so to speak23:36
mardinator_so whether the instances per SIMD are 40 issue flops or 256 is the dilemma23:36
mardinator_amd docs of dx12 kinda are also very raw and not much saying, but people have digged out23:37
mardinator_that it is more closer according to their documents to 256 per SIMD and 1024 per CU those decode_wr_data flops23:37
mardinator_heheee, very weird23:38
mardinator_and i am in the middle of having in one file 40of them23:38
mardinator_and in another 26823:38
mardinator_which i am still processing i shrinked it by rounding to 256, but there was a theory which i allready forgat23:39
mardinator_ZipCPU: the basic idea of such issue queues is shortening the pipeline with async queue toggles23:44
mardinator_where LSU would schedule the instructions extremely fast without entering to couple or more first pipeline stages23:45
mardinator_so to speak it is very very important piece of the puzzle that i am trying to be sure about23:46
ZipCPUGot to run, look forward to hearing more about your success later!23:46
mardinator_yeah, i am reading for couple of hours more, and trying to debug the stuff23:47
mardinator_thanks btw. for your help23:47
mardinator_but miaow has also a dispatcher for 8compute units, that could be little bitrotten and complex to get running on the fpga23:48
mardinator_i mean it is even entirely magically complex , it's insanely complex23:48

Generated by 2.13.1 by Marius Gedminas - find it at!