Sunday, November 17, 2019

Zynq Ultrascale+ SuperSpeed RAM Dumping + v0.2 Carrier

I've gotten a lot of mileage out of my v0.1 (very first version) camera PCB. Partly that's because there's not much to it; it's mostly just power supplies, connectors, and differential pairs. But I'm still surprised I haven't broken it yet, and it's only had some minor design issues. I also made a front enclosure for it with an E-mount flange stolen from a macro extension tube (Amazon's cheapest, of course) and slots for some 1/4-20 T-nuts for tripod mounting.

Stealing an E-mount flange from a macro extension tube is maybe my favorite "Amazon's cheapest" hack so far. I'm not even sure how else to do it. Getting a custom CNC flange machined wouldn't be too bad, but what about the leaf springs?
There are some sensor alignment features, but mostly the board just bolts to the back of the front enclosure. The 1/4-20 T-nuts allow for quick and dirty tripod mounting without having to worry about aluminum threads or Heli-Coils.
No real thought was given to connector placement, user interface, battery wiring/charging, cooling, or anything else other than having something to constrain the sensor and lens the right distance from each other and deal with the massive pixel throughput. Still, it's been useful and reliable. At this point, I've tested most of the important hardware and am just about ready to make some functional improvements for v0.2.

One important subsystem I hadn't tested yet, though, is the USB interface. It's not part of the capture pipeline, but it's important that it operate at USB 3.x speeds for reading image data off the SSD later. The Zynq Ultrascale+ has a built-in USB 3.0 PHY using PS-GTR transceivers at 5Gb/s. This isn't quite fast enough for 5:1 compressed image data at full frame rate, but it's more than fast enough for 30fps playback, or direct access for conversion and editing.

At the moment, though, I'm mainly interested in USB 3.0 for reducing the amount of time it takes to get test image sequences out of the PS-side DDR4 RAM. I've so far been using XSCT commands to read blocks of RAM into a file (mrd -bin -file) over JTAG, but this is limited by the 30MHz JTAG interface. That's a theoretical maximum, too. In practice, it takes several minutes to read out even a short image sequence, and up to an hour to dump the entire contents of the RAM (2GiB). This is all for mere seconds of video...

SuperSpeed RAM Dumping

To remedy this, I repurpose the standalone ZU+ USB mass storage class example to map most of the RAM as a virtual disk, then use a raw disk image reader (Win32 Disk Imager) to read it. This is pretty much what the example does anyway, so my modifications were very minor. So far, I've been able to run my application in On-Chip Memory (OCM), leaving the external DDR4 free for image capture. So, I have to explicitly place the virtual disk in DDR4 in the linker script:

In the application, the virtual disk array also needs to be correctly sized and assigned to the dedicated memory section using an __attribute__:
With that small modification, the application (including the mass storage device driver) runs in OCM RAM, but references a virtual disk array based in external DDR4 at 0x20000000, which is where the image capture data starts. As with the original example, when plugged in to a host, the device shows up as a blank drive of the defined size. Windows asks to format it, but for now I just click Cancel and use Win32 Disk Imager to read the entire 1.25GiB. This copies the raw contents of the "disk" into a binary file, a process I'm all too familiar with from having to recover files from SD cards with corrupted file systems.

But at first I wasn't getting a SuperSpeed (5Gb/s) connection; it was falling back to High-Speed (480Mb/s) through the external USB3320 PHY. (An external USB 2.0 PHY is required on the ZU+, even for SuperSpeed operation.) To further troubleshoot, I took a look at the DCFG and DSTS registers in the USB module. DCFG indicated a Device Speed of 3'b100 (SuperSpeed), but DSTS indicated a Connection Speed of 3'b000 (High-Speed). I figured this meant the PS-GTR link to the host was failing, and after some more poking around I found that its reference clock source was set to incorrect pins and frequency. In my case, I'm feeding it with a 100MHz reference clock on input 2, so I changed it accordingly:


After that, I was able to get a SuperSpeed connection. As a formatted disk drive, I get sequential read speeds of around 300MB/s. Through Win32 Disk Imager, I can read the entire 1.25GiB virtual disk in about seven seconds. So much better! To celebrate, I set off some steel wool fireworks with Bill Kerman. (Steel wool, especially the ultrafine variety, burns quite spectacularly.)


Since I've been putting off the task of NVMe writing, these are still just image sequences that can fit in the RAM buffer. In this case they're actually compressed about 11:1, well beyond my SSD writing requirement, mostly due to the relatively dark and low-contrast scene. The same quantizer settings in a brighter scene with more detail would yield a lower compression ratio. I did separate out the quantizer values for each subband, so I can experiment more with the quality/data rate trade-off.

The most noticeable defects aren't from wavelet compression, they're just the regular sensor defects. There's definitely some "black sun" artifact in the brightest sparks. There's also a rolling row offset that makes the dark background appear to flicker. I did switch to a different power supply for this test, which could be contributing more electrical noise. In any case, I definitely need to implement row noise correction. The combination of all-intraframe compression and a global shutter does make it pretty good for observing the sometimes crazy behavior of individual sparks, though:

This one was gently falling and then just decided to explode into a dozen pieces, shooting off at 20-30mph.
My favorite, though, is this spark that gets flung off like a pair of binary stars. After a while, they decide to part ways and one goes flying up behind Bill's helmet. The comet-like tails are a motion artifact of the multi-slope exposure mode.
Another thing I learned from this is that I probably need an IR-cut filter. I neglected to record some normal footage of the steel wool burning, but it's nowhere near as bright as it looks here. Much of that is just how human visual perception works. I tried to mitigate it somewhat by using the CMV12000's multi-slope exposure mode to rein in the highlights. But I think there's also some near-infrared adding to the brightness here. I'll have to get an external IR-cut filter to test this theory.

Although the image sequence transfer is 100x faster now, it still takes time to adjust settings and trigger the capture over JTAG. I would very much like to do everything over USB in the near future, at least until I have some form of UI on the camera itself. But I also don't really want to write a custom driver. I might try to abuse the mass storage device driver, since it's already working, by adding in some custom SCSI codes for control. This is also the device class I intend to use eventually as the host interface to the SSD, so I should get to know it well.

v0.2 Carrier

Controlling the camera over USB is not the most user-friendly way of doing things, as I know from wrangling drivers and APIs for previous camera projects. I could maybe see an exception where a Pixel 2 (modern Pixels don't have USB 3.0 anymore, because smartphone progress makes no fucking sense) hosts the camera, presenting a nice preview image and dedicated touch interface. But that's a large chunk of Android development that I don't want or know how to do.

Instead, I think it makes sense to stick to something extremely simple for now: an HDMI output and some buttons. I would love to have a touchscreen LCD, but they're huge time, money, power, and reliability sinks. They're also never bright enough, or if they are they kill the power and thermal budget. Better to just move the problem off-board, where it can be solved more flexibly depending on the scenario. At least that's what I'll tell myself.

It seems like there are two main ways to do HDMI out from a Zynq SoC. The more modern Zynq Ultrascale+ development boards, like the ZCU106, use a PL-side GTH transceiver to directly drive a TMDS retimer. This supports HDMI 2.0 (4K), but would rule out the cheaper TE0803 XCZU4 board, since its four PL-side transceivers are already in use for the SSD. The second method uses a dedicated HDMI transmitter like the Analog Devices ADV7513 as an external PHY, which interfaces to the Zynq over a wide logic-level pixel bus. Even though it only goes up to 1080p, this sounds more like what I want for now. I just need a reasonable preview image.

HDMI output subsystem based on the ADV7513.
I had left a bunch of unused pins in the top right corner expecting to need a wide logic-level pixel bus, either for an LCD or an HDMI transmitter. The tricky part was finding room for the connector and IC. I decided to ditch the microSD card holder, which had a bad footprint anyway, to make the space. Without growing the board, I can fit a full-size (Type A) HDMI connector on the top side and the ADV7513 plus supporting components on the bottom. The TMDS lines do have to change layers once, but they're short and length-matched so I think they'll be okay.

At the same time, I also rerouted a good portion of the right edge of the board. The port I've been using for UART terminal access is gone, replaced by a more general-purpose optically-isolated I/O connector. This can still be used for terminal access, or as a trigger/sync signal. I also added a barrel jack connector for power/charge input. Finally, a 0.1" header off the back of the board has the battery power input and some unprotected I/O for two buttons, a rotary encoder, and a red "recording" LED on a separate board. This UI board would be mounted to the top face, right-hand side, where such things would typically be on a camera.

New right-edge connector layout and top face UI board header.
I consider this to be the bare minimum design for standalone functionality. It will need a simple menu and status overlay on the HDMI output. I'm also skipping any BMS or charge circuitry for now, so the battery must be self-contained (like this 3-cell pack) and charged by a CC/CV adapter. It's well-within the power range of USB C charging, so that could be an option in the future, but I don't think it's important enough for this revision.

One of the reasons I don't mind doing more small iterations rather than trying to cram features into one big revision is that I have been able to get these boards relatively fast and cheap from JLCPCB. Originally, I chose their service because they were the first and only place I found with a standard impedance-controlled stack-up, complete with an online calculator. But it's also just the most economical way to get a six-layer impedance controlled board in under two weeks. Each one is around $30. Even including all the power supplies and interfaces, the board is really a minor cost compared to the sensor and SoM it carries.

Other than that, there was only one minor fix that needed to be made regarding the SSD's PCIe reference clock. I had mistakenly assumed this could be generated or forwarded by the ZU+ out of one of its GT clock pairs. But this doesn't seem to be standard practice. Instead, the external clock generator distributes matching reference clocks to both the ZU+ GT clock input and the SSD. I hacked this on to v0.1 with some twisted pair blue wire surgery, but it was easy to reroute for v0.2. Aside from this, I didn't touch any of the differential pairs, or really any other part of the board. Well, I did add one more small component...but that'll be for much later.

These boards should arrive in time for a Thanksgiving weekend soldering session. I plan to build up two this time: one monochrome and, if all goes well, finally, one color. Before then, I'd like to have at least some plan for the NVMe write...

No comments:

Post a Comment