Sunday, May 7, 2023

PCIe Deep Dive, Part 1: Tool Hunt

Over the past few years, I've been developing and improving very fast standalone NVMe-based storage capability for the Zynq Ultrascale+ architecture, to keep up with the absurd speeds of modern SSDs. (Drives like the Seagate Firecuda 530 and Sabrent Rocket 4 Plus-G can now hit 3GB/s+ sustained TLC write speeds, with much higher pSLC cache peaks.) But my knowledge pretty much ended at the interface to the Xilinx DMA/Bridge Subsystem for PCI Express (PG194/PG195). In the usual fashion, I'm now going to dive deeper to explore in more detail how the AXI-PCIe bridge works, and what the PCIe stack actually looks like.

Something I found interesting about PCIe in general is that there seems to be a pretty large barrier built up around the black box. Even just finding learning resources is much harder than it should be. The best I found was PCI Express Technology 3.0 and some accompanying material by MindShare, but even that seems like a prose wrapper on top of the specification. There isn't anything that I would consider a beginner's guide, like you might find for USB or Ethernet.

[Edit by Future Shane] There is a very good series of four articles from Simon Southwell starting here that offers a thorough introduction to PCIe. Definitely check it out if you're going to be exploring PCIe.

For physical tools, the situation is even more bleak. The speeds in PCIe Gen3 (8GT/s) put it in the range where an oscilloscope that can actually measure the signal will cost more than a car. But for all but the lowest-level hardware debugging, a digital capture would suffice, and that's where a protocol analyzer would be nice. Unfortunately, there is no Wireshark equivalent for PCIe; protocol analyzers for it are dedicated hardware that only a few companies develop, and they are priced astronomically.

That is...unless you scout them on eBay for a year.

Biggest "that escalated quickly" of my test equipment stack (ref. PicoScopes below table).

This is a used U4301B that I got in what has to be my second-best eBay score of all time, for less than it would have cost me to rent one for a month. There are only ever a handful of them up for auction at any given time, and the market is so small that the price is basically random, so if you're actually looking for one I can only wish you luck. This one goes up to Gen3 x8, which is fine for my purposes. If you only need Gen1/2 capability, the situation is much better.

[Edit by Future Shane] There is one listed on eBay for a good price right now if anyone else is looking for one. (I'll remove this note after it's no longer available.)

The U4301B is actually just the instrument in the bottom slot of the M9505A AXIe Chassis. This is meant to connect to a PCIe slot on a host machine using an iPass cable and interface card. Newer versions of the chassis controller have a laptop-friendly Thunderbolt connection instead. I "upgraded" mine using an eGPU enclosure, the smaller black box sitting on top.

I said that the U4301B was my second-best eBay score of all time, and that's because the number one is the U4322A probe that I got to go with it, from a different auction. The protocol analyzer is useless without a probe or interposer, and those are even harder to find used. I have never seen a U4322A on eBay before or since the one I got, and all other online listings for them are dead-ends. So the fact that I got one for what might as well be free compared to the new cost is just plain luck.

It was, however, a lot broken...

The probe has two rows of spring-loaded contacts that are meant to touch down on test pads for the PCIe signals. Unfortunately, mine was missing several pins and many others were bent or broken. It had been treated like a scrap cable, rather than a delicate probe. No problem, though, I can just replace the spring pins with some equivalent Mill-Max parts...

...oh, well shit.

This was one of the most ridiculous things I have ever seen under the microscope. Each spring pin has a surface-mount resistor soldered into its tip, and encased in epoxy. What the multi-GHz fuck is going on with these? Well, I suspect they each make up part of a passive probe, also called a Low-Z or Z0 probe. This video explains the concept in detail; it's forming a resistive divider with the 50Ω termination. But it must have extremely low capacitance on the input side of the resistor, hence the resistors embedded in the tips. The good news is that there are no amplifiers in the probe head, so there's not much else that can be broken.

There's no replacement for these pins, so the ones that were missing or broken were a lost cause. But luckily there were enough intact ones to make a full bidirectional x4 link, which is all I really needed. They weren't all in the right locations, so I had to carefully rearrange them with a soldering iron, taking care to use as little solder as possible while still making a strong connection. After making the x4 link, there are only a couple of spare pins remaining, so I need to be very careful with this probe.

Actually the U4322A was not my first choice; what I really wanted was a U4328A M.2 interposer, which taps off the signals at an M.2 connector bridge. But I can convert my basically free U4322A into that using a basically free circuit board. This board just has the test pad footprint for the U4322A in between a short M.2 extension. I carefully mounted the U4322A to the board with standoffs and don't really intend to ever take it off again.

Somewhat to my surprise, this collection of parts actually does work. I was worried that there would be some license nonsense involved, but the instrument license seems to go with the instrument. The host software doesn't require a separate license and worked right away, even through my weird Thunderbolt eGPU enclosure hack. And that's really where the value is. It wouldn't be hard to make an in-system PIPE traffic logger on a Zynq Ultrascale+, and I might do that anyway, but parsing and visualizing the data in a convenient way takes a lot of effort. With the LPA Software, you just get nice graph and packet views straight away:

This all seems like a lot of effort for probing an interface that's now at least two generations old. All this equipment is outdated and could for sure be replaced with a single-board interposer based on a Zynq Ultrascale+. All it needs is two GTH quads, a bunch of RAM, and a high-speed interface to the outside world. But I don't think Keysight or Teledyne LeCroy are interested in that - Gen5 is where the money is. Interestingly, though, the new Keysight Gen5 analyzer is a single-board interposer.

But for now I have Gen3 protocol analysis capability, which is good enough for my purposes. I've used it a bunch in the past few months to explore the different layers of the PCIe stack and components within. There are some really interesting parts that I may cover in future posts. But I'll probably start with an overview of the whole stack, and where the available Xilinx IPs fit into it, since even that is a little confusing at first. There are hard and soft (i.e. HDL) components to it, and not every device has an out-of-the-box solution for making the whole stack. That's enough material for an entire post though, so I'll end this one here.