Friday, 2019-03-08

*** tpb has joined #tomu00:00
*** futarisIRCcloud has joined #tomu00:01
xobstnt: not yet, but I'm getting close00:37
xobsmithro: same place I've been stuck for a while -- massive corruption when using epfifo on real hardware, even to the point where responses aren't coming when they should.00:38
xobsBut I'm starting to narrow down the problem.  The current working theory is that it has to do with the phase relationship between usb48 and usb12.  I'm narrowing the delay down and I think it's getting better.  I'm going to spend today trying to convince the PLL to give me a perfectly-aligned 48 MHz clock.00:39
mithroxobs: If I recall, didn't we have it responding at some point?00:41
xobsmithro: it responds, just not well. I don't think we've ever had it responding, meeting timing, and working on hardware.00:44
mithroxobs: At LCA we were only receiving the data?00:45
xobsWe were, yeah.  And that pathway works just fine, as far as I can see.00:46
mithroxobs: Way I would debug is start with just adding the NZRI and making sure it will send a sequence, then add in the bit stuffer, then add in the shifter00:49
xobsmithro: Sending works fine, too.  I can send data using the unififo and see it reliably on the scope.00:50
xobsI couldn't figure out how to get access to the state machine bits, so I manually had it set various signals inside the Transfer SM.  I noticed the "ERROR" state was getting entered.00:51
xobsThat's why I spent a lot of time this week trying to figure out how FSMs work in Migen.  I really wish they were documented.00:51
xobsAnd now I've come to the conclusion that it has to do with the delay in clock signal between usb12 and usb48.  I got the delay down to... 2.5ns?  1.5ns?  I forget, I had to leave it yesterday.  But after doing that, the number of errors went down.00:52
mithroxobs: Looks like you removed all the CDC from the USB12 to USB4800:55
xobsmithro: I did, and that didn't change the pattern of corrupt transmissions.00:55
mithroxobs: Do you have a capture of the corrupt transmissions?00:59
mithroxobs: Also, if you remove the bitstuffer + nrzi from the front end you can get a better idea of what bytes the shift register is sending01:00
xobsI can get one in a few hours, when I go to the office.01:00
xobsBut oftentimes it's doing things like not even sending ACKs/NAKs.01:00
mithroxobs: What docs do you need beyond ?01:01
tpbTitle: migen/ at master · m-labs/migen · GitHub (at
mithroxobs: I can try and help you debug it when you get into the office01:02
xobsmithro: Lots.  What does `NextValue()` do? What does `NextState()` do?  Why would you use `NextValue()` instead of `signal.eq()`?  What does `before/after_entering/leaving()` do?  How can I access the current state?01:03
xobsI've managed to answer a lot of those questions by looking at the Verilog that gets output.01:03
mithro            * synchronous statements of form `NextValue(a, b)`, equivalent to self.sync += a.eq(b)` when the FSM is in the given `state`;01:04
tpbTitle: migen/ at master · m-labs/migen · GitHub (at
futarisIRCcloudSounds like good progress guys.01:30
futarisIRCcloudxobs / mithro: NB, cbjamo was asking a couple of questions re: USB3340 / ULPI on on #timvideos ... Would it be easy  to remove the USB phy from valentyusb hardware / gateware and replace it with ULPI?01:34
mithrofutarisIRCcloud: Yeah I saw01:34
mithrofutarisIRCcloud: I looked at it a while back and can't remember the conclusion of it it make sense or not...01:35
futarisIRCcloudmithro: Yep. I haven't looked enough into ULPI either.  I'd think rohitksingh might know.01:37
mithrofutarisIRCcloud: Yeah01:38
*** Farscrap has quit IRC03:50
*** Farscrap has joined #tomu04:05
xobsOkay, I think I got the phase difference between clk48 and clk12 down to 1.6ns.  Let me see if that affects things.04:50
* xobs uploaded an image: image.png (175KB) < >04:55
xobsmithro: it's looking much better in that it's actually "ACK"ing now.  But neither the Pi nor the Beagle think those packets are acceptable for some reason.04:56
*** rohitksingh_work has joined #tomu05:07
* xobs uploaded an image: image.png (363KB) < >05:13
xobsThough it still does this, where it doesn't NAK or ACK an IN token for some reason.05:13
mithroxobs: I'm back around now05:35
mithroxobs: So the unififo interface is able to transmit a sequence of bytes correctly, it just can't respond quick enough?05:37
xobsI'll come up with something that responds automatically. At the very least it should be sending ACK packets in response to SETUP packets.05:39
xobsAnd it's just not doing that.05:39
xobs<xobs "I'll come up with something that"> Weird. I sent those message two days ago.05:40
xobsmithro: correct, USB needs to respond within 6.5 bit times, which is well below even the interrupt latency of Vex.05:40
mithroxobs: Was the epfifo working before you tried to reduce the turn around time?05:40
xobsmithro: no, same issue.05:45
mithroxobs: Do you have an example of that issue?05:48
mithroxobs: 6.5 bit times == 6.5 * 12MHz clock right?05:49
xobsYes, 6.5 * 12 MHz clock.05:50
xobsI could check out an earlier version of valentyusb and build that...05:50
mithroxobs: There is a lot I don't understand about this commit ->
tpbTitle: partially working commit · xobs/[email protected] · GitHub (at
xobsmithro: That was me guessing how FSMs work.  But the idea there is to enable tx.i_oe (i.e. start transmitting) as soon as possible, without wasting one cycle going between the IDLE state and the QUEUE_SYNC state.05:57
mithroxobs: Can you get me a capture of what the output is like before you made those changes?06:06
xobsmithro: I still think it has to do with the clocks.06:08
* xobs uploaded an image: image.png (512KB) < >06:08
xobsI'm running at revision 14e116ff82b825069a92292308f4f66ad6167dff06:08
xobsI'm using the newer clock tree, where usb48 is 1.5ns delayed from usb12.06:09
xobsEven with that version I'm getting strange behavior, such as usb_ep_0_in_ibuf_empty_read() getting stuck at 0 and never emptying out.06:14
mithroxobs: I think the clocks are a red herring06:15
xobsAnd sometimes it just never responds.06:15
* xobs uploaded an image: image.png (198KB) < >06:15
mithroxobs: If the state machine goes into the error state, it will stop responding06:17
mithroxobs: You probably want to expose the error state in a CSR somewhere06:18
xobsmithro: I'm not sure how to do that, so I'll just hook it up to the PMOD header.06:23
xobsIt looks like there are at least two states it gets into: The first is where it's in the ERROR state, and the second is where it refuses to empty the ibuf, and /then/ it goes into the ERROR state.06:25
xobsmithro: Is there any documentation on how to use CSRs?  I've made my class derive from "AutoCSR" and I've wired up CSRStatus objects in a comb so that their `status` output gets assigned the value I'm looking at, but it's not showing up in the header file.06:37
rohitksingh_workHi guys! I'll be diving in fomu and ice40 today. Hoping to follow the progress on the USB side. Which repos are the latest one which I should follow? xobs's valentyusb repo seems to be latest one...any more?06:39
mithroxobs: If it isn't appearing in the header, then it's most likely your not including the module?06:42
mithroxobs: Are you sure you are arming the endpoint correctly?06:43
mithroHrm, it looks like I might need to head to bed as have an early meeting tomorrow....06:44
xobsHi rohitksingh_work , I'm not sure what the best repository to start on is.  Probably
tpbTitle: GitHub - xobs/foboot: FPGA-half (at
xobsmithro: I'm creating the CSR in my BaseSoC (which derives from Module).  Does that not work?06:45
rohitksingh_workxobs: awesome! thanks06:46
mithroxobs: You sure you are having the right BaseSoC called?06:46
mithroxobs: also check you are looking at the right header file06:47
mithroxobs: rm ing it is a good way to check06:48
xobsAnd sometimes you'll notice that it enters the ERROR state right away, which shouldn't happen because I do usb_ep_0_out_respond_write(EPF_ACK); usb_ep_0_in_respond_write(EPF_ACK); right before doing usb_pullup_out_write(1); so it should always respond with ACK.06:48
xobsMy BaseSoC derives from SoCCore (which derives from Module), and from AutoCSR.06:48
xobsmithro: I moved the CSRStatus from my BaseSoC into the epfifo, and the registers appeared.06:54
*** _florent_ has joined #tomu07:09
*** xkapastel has quit IRC07:30
xobsIt always seems to be in the WAIT_DATA state when it gets what I think is an OUT packet.07:38
xobsI'm still wondering: why is it missing packets sometimes?  Such as ignoring a SETUP followed by a DATA0 packet.07:39
xobsmithro: how does sm/ even work? It sets NextValue(self.o_pid, pid) in the WAIT_PID state, then maybe moves on to WAIT_BYTE0 and WAIT_BYTE1.  How does the pid not get reset?  I thought migen reset values to 0 if nothing is driving them...07:51
tntxobs: Isn't that only for the '.eq' stuff and not the NextValue() stuff.08:08
xobstnt: The comment mithro posted says that in an FSM, you can have "synchronous statements of form `NextValue(a, b)`, equivalent to `self.sync += a.eq(b)` when the FSM is in the given `state`;", which means that when it changes states there's nothing driving it anymore.08:10
xobsMaybe you're right, tnt , and this issue is caused by the bit detector not aligning correctly...08:11
tntLooking at some migen code my understanding is more like it creates a mux in front of a register with a CE.08:12
tntmux controlled by state and CE is '1' if in anystate that has a NextValue for that signal.08:12
tntxobs: not sure how you do testing, but when I was trying out my core, I basically 'replayed' a captured sequence and purposefully drifted the clock +- 1000 ppm to make things drift in simulation.08:14
xobstnt: That seems like a good way to do it.  What do you use for simulation?08:16
futarisIRCcloudtnt: Sounds like a good way to test.08:18
rohitksingh_workok, my attempt at soldering jumpers wires to fomu test pads blew up. it's been so much time since I used soldering iron. I will just get it soldered by a professional :-|08:19
futarisIRCcloudrohitksingh_work: Try using test clips...08:22
rohitksingh_workfutarisIRCcloud: oooh, that's a nice meant the salae kind of test clips?08:24
tpbTitle: fomu-hardware/hacker at master · im-tomu/fomu-hardware · GitHub (at
rohitksingh_workxobs: that's nice! I've a some of those clips. I'll try with them, thank you so much!08:37
futarisIRCcloudrohitksingh_work: giomasce wrote up the document about the poor man's programmer.08:43
rohitksingh_workoh okay. I can see his commit on github08:59
tntxobs: iverilog.09:37
tntxobs: test bench and data file I used are here
tpbTitle: ice40-playground/cores/usb/sim at experimental · smunaut/ice40-playground · GitHub (at
tntbut it's pure verilog not migen stuff :/09:37
tntnot sure how adaptable it would be to your case.09:38
xobstnt: it might be interesting to use. Or at least to test!10:22
xobsOkay, it looks like the input pathway is corrupting about 10% of packets. At least I know that's the source of the problems. I'm Going to take a break for now.10:54
*** awe00 has joined #tomu10:55
tntxobs: did you reproduce it in sim or was that on real hw ?10:58
xobsReal hardware. Simulation still says everything is peachy.11:02
tntTrying to simulate it now, but I have no idea what I'm doing, never really used migen before :p11:37
futarisIRCcloud10% ???11:40
tnttbh, I would also expect that to be build dependent. I mean the 12m alignement to the 48m is going to be dependent on how far the FF generating it is to the global buffer input. Also, it's relying on nextpnr to actually promote that signal to a global buffer automatically.11:43
futarisIRCcloudThe FF?11:54
tntflip flop11:55
tntThe 12m is generating by dividing the 48m using a couple of register. The resulting clock sure is not drifting vs the 48m, but it's phase is basically random. So outputing a signal in the 48m domain and sampling it in the 12m domain and hoping it's good is a bit optimistic.11:57
tntEspecially if you hope to transmit a new data at every 12m rising edge.11:57
tntIdeally you'd need after reset to determine which phase you have btw the 48m and the 12m one, then select one of 4 possible phase in the 48m domain to output data to be crossed from 48m -> 12m and only use that one.11:58
tntBut that would only solve the internal 48m -> 12m crossing. The problem of having a recovered strobe from the incoming usb stream that can be slower/faster than your internal 12m clock would remain.11:59
xobs<freenode_tnt "But that would only solve the in"> tnt: I have a build I'm using that I need to push. I'm dividing the 48 down through a FF, then running that through the pll. They're 1.6ns out of phase with that trick, and it doesn't change between builds.12:06
tntI just realized the pins usb_d_p / usb_d_n are not actually the pins used for usb ... probably why I couldn't get anything to do anythin gin my simu lol12:11
tntxobs: ok. Is that the rising edge of clk12 lagging behind by 1.6ns  or being 1.6 ns the 48m edge ?12:13
xobsClk12 rises first, as it's driving clk4812:16
tntwhy are you just not using the PLL to get both 12 and 48 btw ?12:18
xobsCan it do that? My impression was that (1) the lowest frequency it'll output is 16 MHz, and (2) the dividers are fixed to 3.5 or 7.12:19
tntmmm, right, it can pass along the (delayed) input reference clock, which in my case is 12M that's why I can get 12M and 48M, but your input clock is 48M.12:22
tntYou might still be able to use it to pass along your 48M, program the PLL for 24M and then use the GENCLK_HALF output mode.12:23
xobsYeah, I'm considering changing the crystal to 12 MHz in the final version.12:23
xobsDoesn't genclk_half say nothing about the phase relationship?12:23
tntAFAIK the edges are aligned ... but ... the doc on the PLL are really not that clear / explicit.12:24
tntIs the output of the rx shifter o_put / o_data supposed to be the packet bytes already aligned ?12:35
tntFirst packet is fine, but even the second is already screwed up  PID byte is reported as C2 instead of C312:39
xobsI'm pretty sure it's supposed to be aligned (cc: mithro), but that sounds like exactly the problem that I'm hitting.12:53
xobstnt: I just got home.  THis is my clock tree that generates the 1.6ns delay I was talking about:
tpbTitle: foboot/ at master · xobs/foboot · GitHub (at
tntAlso btw, nextpnr reports Info: Max delay posedge clk48_1 -> posedge clk12_$glb_clk: 13.05 ns12:59
tntat the very least, it should go register to register because nextpnr doesn't constrain cross-clock path at all, just reports them.12:59
xobsThat's fine, depending on which "clk48" it's talking about.  With the PLL-derived clk48, there are two "clk48" domains: one that's raw from the crystal, and one that's derived from the PLL.13:00
tnterr maybe. But I had some much troubles in the past with CDC that I don't take any chances nowadays. The only way I would trust output from 48m to be captured in 12m success fully is to ensure the 48m strobe rising edge matches the 12m falling edge and go register to register to be right in the middle of that data valid eye.13:08
*** rohitksingh_work has quit IRC13:09
futarisIRCcloudCDC = clock domain crossing13:09
xobsWell, I /can/ tell you that I accidentally discovered that the timing is very good on it, because I was measuring the phase relationship between them and accidentally set my scope to 12 kHz.  It didn't seem to mind, and the aliasing made it seem reasonable.13:09
xobsBut I agree, CDC is hard.  Which, again, is why I'm considering changing the BOM item for the next batch.13:10
tntBut without explicit sync you can always get what I posted in the screenshot above.13:10
xobsI wonder if it's worth it to run the PLL at 12 MHz despite it being out of sync.13:10
tntThe clock recovery stobes moves one cycle back/forth and that just happens to make it go from one 12m cycle to the next, completel screwing up the data.13:11
xobsYeah, totally agreed.  It's not great at all, and the bit error rate right now is too high to be usable.13:12
xobsBut at least the clock is *mostly* good now, and thanks to your admonishments we're pretty sure where the problem lies.13:13
tntWhen I talked to mithro at ccc, I though that the 8:1 serdes was done in the 48m domain, thus you just had 1 strobe every so often from 48m to 12m giving you ample time to cross clock.13:13
tntYou could just try to instanciate an sync fifo passing the bits back and forth between the two domain ... it's hugely wastefu resource wise, but should be easy to do in migen and safe.13:14
tnt"async fifo" not "sync" obviously13:15
tntlol, I had to google "admonishments" :)  I hope I didn't come out as pushy or anything, just trying to help with stuff I've seen in the past.13:16
xobsIt's okay, I'm sorry, I'm just getting a bit frustrated with this.  It's very challenging, and not something I've done extensively before.  So I know I'm running into all sorts of issues.13:18
xobsI'm pretty sure it's not a problem with CDC, because I'm using an older version that has async fifos all over the place.  It won't necessarily respond in time, but even that's getting corrupt packets.13:21
tnt:/ Do you have a commit id ?  I'm curious to run it through the sim I have setup here.13:24 least I thought it did.  Now I'm just seeing it use cdc.MultiReg13:24
tntoh yeah, MutiReg wouldn't help.13:24
xobsAh, the AsyncFIFOs are in the CPU interface.  Maybe I'll throw that hammer at it then.13:25
xobs is the tree that I'm looking at.13:25
tpbTitle: GitHub - xobs/valentyusb at 14e116ff82b825069a92292308f4f66ad6167dff (at
xobsHow are you running this through your testbench?  Just generating the Verilog files and hooking them up to iverilog?13:27
tntJust a thought, but the RxShifter is pretty small ... I don't see why that couldn't be moved to the 48m domain, IMHO that would be way simpler to cross 8 bits at once every 32 clock cycles than 1 bit every 4.13:27
tntxobs: Let me zip the whole thing.13:27
tntI generated the top.v normally. Then I did manually patch a couple of things in it. I decreased the reset counter from 4096 to 100 to get it to start quicker, and I also added a 5 ns delay between clk_12m edge and clk_48m edge (that was to simulate the hw, before I knew about the PLL).13:31
xobsThat seems very, very sane.13:51
tntrunning the older tree through that sim still shows error for me. At different places, but still bytes are corrupted.  Also the shifter data bus seems bit-reversed ?13:57
xobsThat does make sense.  I think that, because of how it's set up (with the sentinal bit shifting right), it ends up going into the shifter backwards, and when it's wired into the CPU interface it gets sorted out the right way.13:58
tntyeah, migen is a bit weird that way with 0:7 vs 7:0 in verilog, ... sorts itself out if you stay in migen I guess but when looking at it externally it's a bit confusing.13:59
*** awe00 has quit IRC14:05
*** xkapastel has joined #tomu14:05
*** awe00 has joined #tomu14:11
xobstht: Ah, I see.  You're ignoring the CSR problem and just sending data to it.  That's a really good approach.14:47
tntWhat's the CSR problem ? :)14:48
xobstnt: The thing I haven't figured out yet is how to simulate the whole thing.  CSRs are how you control the block and get status.  For example, there are signals that get wired to an EventManager that shows up in a memory-mapped register.  I've been trying to figure out how to access those registers in a simulator.14:50
tntAh yeah. Here since I just wanted to see the behavior of the clock crossing / deserializer, I didn't bother.14:53
xobsBut since that's where the problem lies, that's probably good enough.14:53
*** awe00 has quit IRC15:06
*** awe00 has joined #tomu15:08
xobsWow, tnt, this iverilog output looks so cool.  And it's super fast to generate.  Now I need to get the USB sigrok filter working, too.15:20
tpbTitle: [Diff] diff --git a/valentyusb/usbcore/rx/ b/valentyusb/usbcore/rx/ index - (at
tntThat's a quick hack to put a fifo in there. The data output of the shifter looks a whole lot saner to me with this.15:23
tntthe sigrok decoding can be a bit fussy, sometime I have to 'refresh' (using the reload button) several time for it to decode properly.15:26
*** awe00 has quit IRC15:35
*** futarisIRCcloud has quit IRC15:39
*** awe00 has joined #tomu15:41
mithrotnt: fixing the clock 48 to 12 handling was on my to-do list15:51
tntthe hack I posted above might be enough to confirm it's the issue and it fixes the corruption xobs was seeing on RX. But not sure I'd call it a 'fix'. That fifo adds unacceptable latency.15:52
mithrotnt: correct15:53
xobsIt's enough to get started.15:53
xobsI'm stuck trying to get sigrok filtering working now, though :)15:53
tntAh, well might be able to help with that. What are the symptoms ?15:54
xobstnt: no errors, but nothing happens.15:55
xobsFair warning: I'm running this on Windows.15:56
xobsSo if I run it under gtkwave.exe and I add the filter of "C:\Python36\python.exe" and set the args to " -P usb_signalling:signalling=full-speed,usb_packet:signalling=full-speed" then it at least shows decoded bits.15:56
xobsIf I run it under `gtkwave` on Ubuntu 16.04, and add `` as the filter, then I just get a solid yellow line.15:57
mithroxobs: check the time scale in the vcd file?15:58
xobsI guess it's a good sign that, under Ubuntu, if I run " < [vcdfile]" it also has no output, so at least it's being consistent.15:59
tntThis is what I do :
tpbTitle: Open VCD From usb_tb, Add usb_dp From usb_tb, Add usb_dn Select them both in - (at
tntwell you can't just feed the simulation vcd file to dec_usb. gtkwave will generate a specially formatted vcd with just the data selected.16:00
tntBut if you see the bits, you just need to insert blank traces at the right place to see the rest.16:00
xobsSo on Windows, I see bits, and if I do "Insert Blank" I get a red trace with no data that I can drag up to be directly between "USB" and "usb_dn".16:02
xobsOn Ubuntu I just get a solid yellow line and no error messages.16:02
xobsOh, that's nice, Ubuntu's console actually works.16:03
xobsRather, `Unknown option --protocol-decoder-samplenum`16:03
xobsGuess I need to update sigrok.16:03
tntah yeah, that's the option to get it to output which sample number each decoded output matches to.16:04
mithroxobs: btw I would expect iverilog to be roughly 1000 times faster then Migen's simulator16:04
tntCan you post a screenshot of windows ? Never saw that beahvior :/16:04
* xobs uploaded an image: image.png (53KB) < >16:05
tntOh, the decoder filter isn't even active or you wouldn't see 'USB' as the name of the trace.16:05
mithroxobs: I would make sure you have sigrok from head16:09
xobsAlright, I'll figure out how to build it tomorrow.  I actually modified the Python file to call sigrok-cli.exe, and I think that's actually working, it's just not able to read the tempfile.16:09
mithroxobs: I would make sure you have sigrok from head16:09
xobsAnyway, thanks for the help!  This is looking very promising.16:10
xobsFor now, I think I'm going to head to bed.16:10
mithroI'm sure I showed you how to view this on Migen output to?16:11
*** xkapastel has quit IRC16:15
*** xkapastel has joined #tomu16:25
tntOh, there is the samll issue that with the fifo it doesn't meet timing :/16:29
mithrotnt: Need to create a 1bit buffer using flipflops16:31
tntInfo: 6.1 ns logic, 16.5 ns routing16:35
tnt ICESTORM_LC:  4797/ 5280    90% is probably part of the issue.16:37
tntOk, tweaking the synthesis options brought that to 4575/ 5280   and 10 MHz faster, to 53 MHz16:41
tntI have no idea if migen allows it, but adding "-relut -dffe_min_ce_use 4" to synth_ice40 options is pretty much always a gain in both size and frequency.16:43
*** awe00 has quit IRC17:01
*** rohitksingh has joined #tomu17:33
*** rohitksingh has quit IRC19:36
*** rohitksingh has joined #tomu19:49
*** rohitksingh has quit IRC21:13
*** Farscrap has joined #tomu23:54

Generated by 2.13.1 by Marius Gedminas - find it at!