Wednesday, January 20, 2016

GS3 / Surfacecam GPU-Accelerated Preview

I've been further evolving the FlyCapture-based capture software for my Grasshopper3 camera system. This step involved merging in some work I had done to create a GPU-accelerated RAW file viewer. The viewer opens RAW files of the type saved out by the capture software and processes them through debayer, color correction, and convolution (sharpen/blur) pixel shaders. It was my first GPU-accelerated coding experience and I gained an appreciation for just how fast the GPU could do image processing tasks that take many milliseconds if done in software.

Some of the early modifications I made to the FlyCapture demo program were to reduce the frequency of the GDI-based UI thread, limit the size of the preview image, and force the preview to use only the easiest debayer algorithm (nearest-neighbor). This cut down the CPU utilization enough that the capture thread could buffer to RAM at full frame rate without getting bogged down by the UI thread. This was especially important on the Microsoft Surface Pro 2, my actual capture device, which has fewer cores to work with.

Adding the GPU debayer and color correction into the capture software lets me undo a lot of those restrictions. The GPU can run a nice debayer algorithm (this is the one I like a lot), do full color correction and sharpening, and render the preview to an arbitrarily large viewport. The CPU is no longer needed to do software debayer or GDI bitmap drawing. Its only responsibility is shuttling raw data to the GPU in the form of a texture. More on that later. This is the new, optimized architecture:

Camera capture and saving to the RAM buffer is unaffected. RAM buffer writing to disk is also unaffected. (Images are still written in RAW format, straight from the RAM buffer. GPU processing is for preview only.) I did simplify things a lot by eliminating all other modes of operation and making the RAM buffer truly circular, with one thread for filling it and one for emptying it. It's nice when you can delete 75% of your code and have the remaining bits still work just as well.

The UI thread has the new DirectX-based GPU interface. Its primary job now is to shuttle raw image data from the camera to the GPU. The mechanism for doing this is via a bound texture - a piece of memory representing a 2D image that both the CPU and the GPU have access to (not at the same time). This would normally be projected onto a 3D object but in this case it just gets rendered to a full-screen rectangle. The fastest way to get the data into a texture is to marshal it in with raw pointers, something C# allows you to do only within the context of the "unsafe" keyword...I wonder if they're trying to tell you something.

The textures usually directly represent a bitmap. So, most of the texture formats allow for three-color pixel formats such as R32G32B32A32 (32 bits of floating point each for Red, Green, Blue, and Alpha). Since the data from the camera represents raw Bayer-masked pixels, I have had to abuse the pixel formats a little. For 8-bit data, it's not too bad. I am using R8_UNorm format, which just takes an unsigned 8-bit value and normalizes it to the range 0.0f to 1.0f. 

12-bit is significantly more complicated, since there are no 12- or 24-bit pixel formats into which one or two pixels can be stuffed cleanly. Instead, I'm using R32G32B32_UInt and Texture.Load() instead of Texture.Sample(). This allows direct bitwise unpacking of the 96-bit pixel data, which actually contains eight adjacent 12-bit pixels. And I do mean bitwise...the data goes through two layers of rearrangement on its way into the texture, each with its own quirks and endianness, so there's no clean way to sort it out without bitwise operations.

This might be something like what's actually going on.
In order to accommodate both 8-bit and 12-bit data, I added an unpacking step that is just another pixel shader that converts the raw data into a common 16-bit single-color format before it goes into the debayer, color correction, and convolution shader passes just like in the RAW file viewer. The shader file is linked below for anyone who's interested.

The end result of all this is I get a cheaply-rendered high-quality preview in the capture program, up to full screen, which looks great on the Surface 2:

Once the image is in the GPU, there's almost no end to the amount of fast processing that can be done with it. Shown in the video above is a feature I developed for the RAW viewer, saturation detection. Any pixel that is clipped in red, green, or blue value because of overexposure or over-correction gets its saturated color(s) inverted. In real time, this is useful for setting up the exposure. Edge detection for focus assist can also be done fairly easily with the convolution shader. The thread timing diagnostics show just how fast the UI thread is now. Adding a bit more to the shaders will be no problem, I think.

For now, here is the shader file that does 8-bit and 12-bit unpacking (12-bit is only valid for this camera...other cameras may have different bit-ordering!), as well as the debayer, color correction, and convolution shaders.

Shaders: debayercolor_preview.fx

Saturday, November 7, 2015

DRSSTC Δt5: MIDI Streaming and Shutter Syncing

Until last week, I hadn't really touched my Tesla coil setup since I moved out to Seattle. Maybe because the next step was a whole bunch of software writing. As of Δt3, I had written a little test program that could send some basic frequency (resonant and pulse generation) and pulse shaping commands to the driver. But it was just for a fixed frequency of pulse generation and of course I really wanted to make a multi-track MIDI driver for it.

The number I had in mind was three tracks, to match the output capabilities of MIDI Scooter. While the concept was cool, the parsing/streaming implementation was flawed and the range of notes that you can play with a motor is kinda limited by the power electronics and motor RPM. So I reworked the parser and completely scrapped and rebuilt the streaming component of it. (More on that later.) Plus I did a lot of preliminary thinking on how best to play three notes using discrete pulses. As it turns out, the way that works best in most scenarios is also the easiest to code, since it uses the inherent interrupt prioritization and preemption capabilities that most microcontrollers have.

So despite my hesitation to start on the new software, it actually turned out to be a pretty simple coding task. It did require a lot of communication protocol code on both the coil driver and the GUI, to support sending commands and streaming MIDI events at the same time. But it went pretty smoothly. I don't think I've written as many lines of code before and had them mostly work on the first try. And the result is a MIDI parser/streamer that I can finally be proud of. Here it is undergoing my MIDI torture test, a song known only as "Track 1" from the SNES game Top Gear.

The note density is so high that it makes a really good test song for MIDI streaming. I only wish I had more even more tracks...

The workflow from .mid file to coil driver is actually pretty similar to MIDI scooter. First, I load and parse the MIDI, grouping events by track/channel. Then, I pick the three track/channel combinations I want to make up the notes for the coil. These get restructured into an array with absolute timestamps (still in MIDI ticks). The array is streamed wirelessly, 64 bytes at a time, to a circular buffer on the coil driver. The coil driver reports back what event index is currently playing, so the streamer can wait if the buffer is full.

On the coil driver itself, there are five timers. In order of interrupt priority:

  • The pulse timer, which controls the actual gate drive, runs in the 120-170kHz range and just cycles through a pre-calculated array of duty cycles for pulse shaping. It's always running, but it only sets duty cycles when a pulse is active. 
  • Then, there are three note timers that run at lower priority. Their rollover frequencies are the three MIDI note frequencies. When they roll over, they configure and start a new pulse and then wait for the pulse to end (including ringdown time). They're all equal priority interrupts, so they can't preempt each other. This ensures no overlap of pulses.
  • Lastly, there's the MIDI timer, which runs at the MIDI tick frequency and has the lowest interrupt priority. It checks for new MIDI events and updates the note timers accordingly. I'm actually using SysTick for this (sorry, SysTick) since I ran out of normal timers.
There are three levels of volume control involved as well. Relative channel volume is set by configuring the pulse length (how many resonant periods each pulse lasts). But since the driver was designed to be hard-switched, I'm also using duty cycle control for individual note volume. And there is a master volume that scales all the duty cycles equally. All of this is controlled through the GUI, which can send commands simultaneously while streaming notes, as shown in the video.

It's really nice to have such a high level of control over the pulse generation. For example, I also added a test mode that produces a single long pulse with gradually ramped duty cycle. This allows for making longer, quieter sparks with low power...good for testing.

I also got to set up an experiment that I've wanted to do ever since I got my Grasshopper 3 camera. The idea is to use the global shutter and external trigger capabilities of the industrial camera to image a Tesla coil arc at a precise time. Taking it one step further, I also have my favorite Tektronix 2445 analog oscilloscope and a current transformer. I thought that it would be really cool to have the scope trace of primary current and the arc in the same image at the same time, and then to sweep the trigger point along the duration of the pulse to see how they both evolve.

The setup for this was a lot of fun.

Camera is in the foreground, taped to a tripod because I lost my damn tripod adapter.
Using a picture frame glass as a reflective surface with a black background (and a neon bunny!).
I knew I wanted to keep the scope relatively far from the arc itself, but still wanted the image of the scope trace to appear near the spark and be in focus. So, I set up a reflective surface at a 45º angle and placed the scope about the same distance to the left as the arc is behind the mirror, so they could both be in focus. When imaged straight on, they appear side by side, but the scope trace is horizontally flipped, which is why the pulse progresses from right to left.

This picture is a longer exposure, so you can see the entire pulse on the scope. To make it sweep out the pulse and show the arc condition, I set the exposure to 20-50μs and had the trigger point sweep from the very start of the pulse to the end on successful pulses. So, each frame is actually a different pulse (should be clear from the arcs being differently-shaped) but the progression still shows the life cycle of the spark, including the ring-up period before the arc even forms.The pulse timer fires the trigger at the right point in the pulse through a GPIO on the microcontroller. Luckily, the trigger input on the camera is already optocoupled, so it didn't seem to have any issues with EMI.

Seeing the pulse shape and how it relates to arc length is really interesting. It might be useful for tuning to be able to see primary current waveform and arc images as different things are adjusted. No matter what, the effect is cool and I like that it can only really be done with old-school analog methods and mirrors (no smoke, yet).

Monday, September 21, 2015

GS3 / SurfaceCam Multipurpose Adapter

It's been a while since I've made anything purely mechanical, so I had a bit of fun this weekend putting together a multipurpose adapter for my Grasshopper 3 camera.

The primary function of the adapter is to attach the camera to a Microsoft Surface Pro tablet, which acts as the monitor and recorder via USB 3.0. I was going to make a simple adapter but got caught up in linkage design and figured out a way to make it pivot 180º to fold flat in either direction.

Some waterjet-cut parts and a few hours of assembly later, and it's ready to go. The camera cage has 1/4-20 mounts on top and bottom for mounting to a tripod, or attaching accessories, like an audio recorder in this case. There's even a MōVI adapter for attaching just the Sufrace Pro 2 to an M5 handlebar for stabilizer use. (The camera head itself goes on the gimbal inside a different cage, if I can ever find a suitable USB 3.0 cable.)

Anyway, quick build, quick post. Here are some more recent videos I've done with the GS3 and my custom capture and color processing software.

Plane spotting at SeaTac using the multipurpose adapter and a 75mm lens (250mm equivalent).

Slow motion weed trimming while testing out an ALTA prototype. No drones were harmed in the making of this video.

Freefly BBQ aftermath. My custom color processing software was still a bit of a WIP at this point.

Saturday, January 17, 2015

Three-Phase Color

I was doing a bit more work on my DirectX-based .raw image viewer when I came across a nice mathematical overlap with  three-phase motor control theory. It has to do with conversion from red/green/blue (RGB) to hue/saturation/lightness (HSL), two different ways of representing color. Most of the conversion methods are piecewise-linear, with max(), min(), and conditionals to break up the color space. But I figured motors are round and color wheels are round, so maybe I would try applying a motor phase transform to [R, G, B] to see what happens.

The transform of interest is the Clarke transform, which converts a three-phase signal into two orthogonal components (α and β) and a zero-sequence component (γ) that is just the average of the three phases. In motor control with symmetric three-phase signals, γ is usually zero. Applied to [R, G, B], it's just the intensity, one measure of lightness.

In motor control, it's common to find the phase and magnitude of the vector defined by α and β, for example to determine the amplitude and electrical angle of sinusoidal back EMF in a PMSM. It turns out the phase and magnitude are useful in color space as well, representing the hue and saturation, respectively. It might not be exactly adherent to the definition of these terms, but rather than rambling on about hexagons and circles, I'll just say it is close enough for me. (The Wikipedia article's alternate non-hexagon hue (H2) and chroma (C2) calculation is exactly the Clarke transform and magnitude/phase math.)

So I added this hue and saturation adjustment method to the raw viewer's pixel shader:

I'm particularly happy about the fact that it occupies barely 15 lines of HLSL code:

// Clarke Transform Color Processing:
c_alpha = 0.6667f * tempcolor.r - 0.3333f * tempcolor.g - 0.3333f * tempcolor.b;
c_beta = 0.5774f * tempcolor.g - 0.5774f * tempcolor.b;
c_gamma = 0.3333f * tempcolor.r + 0.3333f * tempcolor.g + 0.3333f * tempcolor.b;
c_hue = atan2(c_beta, c_alpha);
c_sat = sqrt(pow(abs(c_alpha), 2) + pow(abs(c_beta), 2));
c_sat *= saturation;
c_hue += hue_shift;
c_alpha = c_sat * cos(c_hue);
c_beta = c_sat * sin(c_hue);

tempcolor.r = c_alpha + c_gamma;
tempcolor.g = -0.5f * c_alpha + 0.8660f * c_beta + c_gamma;
tempcolor.b = -0.5f * c_alpha - 0.8660f * c_beta + c_gamma;

I doubt it's the most computationally efficient way to do it (with the trig and all), but it does avoid a bunch of conditionals from the piecewise methods. And as I mentioned in the last post, the pixel shader is far from the performance bottleneck of the viewer right now.

Updated HLSL Source: debayercolor.fx

Updated Viewer Source (VB 2012 Project):
Built for .NET 4.0 64-bit. Requires the SlimDX SDK.

And for fun, here's some 150fps video of a new kitchen appliance I just received and hope to put to good use soon:

Saturday, December 27, 2014

Fun with Pixel Shaders

One of the things that saved my ass for MIT Mini Maker Faire was SlimDX, a modern (.NET 4.0) replacement for Microsoft's obsolete Managed DirectX framework. I was only using it to replace the managed DirectInput wrapper I had for interfacing to USB HID joysticks for controlling Twitch and 4pcb. For that it was almost a drop-in replacement. But the SlimDX framework also allows for accessing almost all of DirectX from managed .NET code.

I've never really messed with DirectX, even though at one point long ago I wanted to program video games and 3D stuff. The setup always seemed daunting. But with a managed .NET wrapper for it, I decided to give it a go. This time I'm not using it to make video games, though. I'm using it to access GPU horsepower for simple image processing.

The task is as follows:

The most efficient way to capture and save images from my USB3.0 camera is as raw images. The file is a binary stream of 8-bit or 12-bit pixel brightnesses straight from the Bayer-masked sensor. If one were to convert these raw pixel values to a grayscale image, it would look like this:

Zoom in to see the checkerboard Bayer's lost a bit in translation to a grayscale JPEG, but you can still see it on the car and in the sky.

The Bayer filter encodes color not on a per-pixel basis, but in the average brightness of nearby pixels dedicated to each color (red, green, and blue). This always seemed like cheating to me; to arrive at a full color, full resolution image, 200% more information is generated by interpolation than was originally captured by the sensor. But the eye is more sensitive to brightness than to color, so it's a way to sense and encode the information more efficiently.

Anyway, deriving the full color image from the raw Bayer-masked data is a bit of a computationally-intensive process. In the most serial implementation, it would involve four nested for loops to scan through each pixel looking at the color information from its neighboring pixels in each direction. In pseudo-code:

// Scan all pixels.
for y = 0 to (height - 1)
 for x = 0 to (width - 1)
  Reset weighted averages.  

  // Scan a local window of pixels.
  for dx = -N to +N
   for dy = -N to +N
    brightness = GetBrightness(x+dx, y+dy)
    Add brightness to weighted averages.

  Set colors of output (x, y) by weighted averages.


The window size (2N+1)x(2N+1) could be 3x3 or 5x5, depending on the algorithm used. More complex algorithms might also have conditionals or derivatives inside the nested for loop. But the real computational burden comes from serially scanning through x and y. For a 1920x1080 pixel image, that's 2,073,600 iterations. Nest a 5x5 for loop inside of that and you have 51,840,000 times through. This is a disaster for a CPU. (And by disaster, I mean it would take a second or two...)

But the GPU is ideally-suited for this task since it can break up the outermost for loops and put them onto a crapload of miniature parallel processors with nothing better to do. This works because each pixel's color output is independent - it depends only on the raw input image. The little piece of software that handles each pixel's calculations is called a pixel shader, and they're probably the most exciting new software tool I've learned in a long time.

For my very first pixel shader, I've written a simple raw image processor. I know good solutions for this already exist, and previously I would use IrfanView's Formats plug-in to do it. But it's much more fun to build it from scratch. I'm sure I'm doing most of the processing incorrectly or inefficiently, but at least I know what's going on under the hood.

The shader I wrote has two passes. The first pass takes as its input the raw Bayer-masked image and calculates R, G, and B values for each pixel based on the technique presented in this excellent Microsoft Research technical article. It then does simple brightness, color correction, contrast, and saturation adjustment on each pixel. This step is a work-in-progress as I figure out how to define the order of operations and what techniques work best. But the framework is there for any amount of per-pixel color processing. One nice thing is that the pixel shader works natively in single-precision floating point, so there's no need to worry about bit depth of the intermediate data for a small number of processing steps.

The second pass implements an arbitrary 5x5 convolution kernel, which can be used for any number of effects, including sharpening. The reason this is done as a second pass is because it requires the full-color output image of the first pass as its input. It can't be done as part of a single per-pixel operation with only the raw input image. So, the first pass renders its result to a texture (the storage type for a 2D image), and the second pass references this texture for its 5x5 window. The output of the second pass can either be rendered to the screen, or to another texture to be saved as a processed image file, or both.

What a lovely Seattle winter day.
Even though the pixel shader does all of the exciting work, there's still the matter of wrapping the whole thing in a .NET project with SlimDX providing the interface to DirectX. I did this with a simple VB program that has a viewport, folder browser, and some numeric inputs. For my purposes, a folder full of raw images goes together as a video clip. So being able to enumerate the files, scan through them in the viewer, and batch convert them to JPEGs was the goal.

Hrm, looks kinda like all my other GUIs...

As it turns out, the pixel shader itself is more than fast enough for real-time (30fps) processing. The time consuming parts are loading and saving files. Buffering into RAM would help if the only goal was real-time playback, but outputting to JPEGs is never going to be fast. As it is, for a 1920x1200 image on my laptop, the timing is roughly 30ms to load the file, an immeasurably short amount of time to actually run both pixel shader passes, and then 60ms to save the file. To batch convert an entire folder of 1000 raw images to JPEG, including all image processing steps, took 93s (10.75fps), compared to 264s (3.79fps) in IrfanView.

Real-time scrubbing on the MS Surface Pro 2, including file load and image processing, but not including saving JPEGs.

There are probably ways to speed up file loading and saving a bit, but even as-is it's a good tool for setting up JPEG image sequences to go into a video editor. The opportunity to do some image processing in floating point, before compression, is also useful, and it takes some of the load off the video editor's color correction.

I'm mostly excited about the ability to write GPU code in general. It's a programming skill that still feels like a superpower to me. Maybe I'll get back into 3D, or use it for simulation purposes, or vision processing. Modern GPUs have more methods available for using memory that make the cores more useful for general computing, not just graphics. As usual with new tools, I don't really know what I'll use it for, I just know I'll use it.

If you're interested in the VB project and the shader source (both very much works-in-progress), they are here:

VB 2012 Project:
Built for .NET 4.0 64-bit. Requires the SlimDX SDK.

HLSL Source: debayercolor.fx

P.S. I chose DirectX because it's what I have the most experience with. (I know somebody will say, "You should use OpenGL.") I'm sure all of this could be done in OpenGL / GLSL as well.

Saturday, October 18, 2014

MIT Mini Maker Faire 2014

I made a quick trip back to Boston/Cambridge for the first ever MIT Mini Maker Faire. Recap and pictures below, but first here is some video from the event:

As expected from an MIT Maker Faire, there were lots of electric go-karts, Tesla coils, 3D printers, robots, and... scooter...things.
To this I contributed a set of long-time Maker Faire veteran projects (Pneu Scooter, 4pcb, Twitch) and a couple of new things (Talon v2 multirotor, Grasshopper3 Camera Rig). I always like to bring enough projects that if some don't work or can't be demonstrated, I have plenty of back-ups. Fixing stuff on the spot isn't really possible when you have a constant stream of visitors. But I've been to a number of Maker Faires and decided the maximum number of projects I can keep track of is five. Especially since this time I had to be completely mobile, as in airline mobile.

Arriving at the venue, MIT's North Court, luggage in the foreground, MIT Stata Center in the background.
The travel requirement meant that, unfortunately, tinyKart transport was out. (Although it is theoretically feasible to transport it for free via airline except for the battery and the seat...) But Pneu Scooter is eminently flyable and in fact has been all over the world in checked baggage already. It collapses to about 30" long and weighs 18lbs. The battery is well within TSA travel limits for rechargeable lithium ion batteries installed in a device. Oh, and Twitch fits right between the deck and the handlebar:

It was definitely designed that way...
Pneu Scooter and Twitch are really all I should ever bring to Maker Faires. They are low-maintenance and very reliable; both have survived years of abuse. In fact, Pneu Scooter is almost four year old now...still running the original motor and A123 battery pack, and still has a decent five-mile range. (I range-tested it before I left.) It's been through a number of motor controllers and wheels though. Because the tires are tiny, it's always been a pain in the ass accessing the Schrader valves with normal bike pumps. Turns out it just took five minutes of Amazon Googling (Is that a thing?) to solve that problem:

Right-angle Schrader check valve. Why Have I not had this forever?
Pneu Scooter survived the rainy Faire with no issues - it's been through much worse. I participated in the small EV race featuring 2- and 3-wheel vehicles. Unfortunately I didn't get any video of it, but Pneu Scooter came in third or something...I wasn't keeping track and nobody else was either. Mostly I was occupied by trying to avoid being on the outside of the drift trike:

Yes, those red wheels are casters...
But if I had to pick one project that I could pretty much singly count on for Maker Faire duty, it's Twitch.

Despite the plastic Vex wheels, Twitch has been pretty durable over the years. I had planned to spend a few days rebuilding it since I thought one of the motors was dead, but when I took it off the shelf to inspect, it was all working fine. In fact, the only holdup for getting it Faire-ready was that the Direct Input drivers I have been using since .NET 1.0 to read in Twitch's USB gamepad controller are no longer supported by Windows 7/8. Yes, Twitch outlasted a Microsoft product lifecycle... Anyway, after much panic, I found a great free library called SlimDX that offers an API very similar to the old Managed Direct X library, so I was back in action.

Basically, Twitch is an infinite source of entertainment. I spent a lot of the Faire just driving it around the crowd from afar and watching people wonder if it's autonomous... I would also drive it really slowly in one orientation, wait for a little kid to attempt to step on it (they always do), and then change it to the other orientation and dart off sideways. And then there is just the normal drifting and driving that is unlike any other robot most people have seen. I found an actual use for the linkage drive too - when it would get stuck with two wheels on pavement and two wheels on grass, it was very easy to just rotate the wheels 90º and get back on the walkway. Seattle drivers need this feature for when it snows...

Twitch is definitely my favorite robot. Every time I take it out, it gets more fun to drive. I have 75% of the parts I need to make a second, more formidable version... This Maker Faire was enough to convince me that it needs to be finished.

4pcb was a bit of a dud. I don't know if my standards for flying machines have just gotten higher or if it always flew as bad as it did during my pre-Faire flight test. It still suffers from a really, really bad structural resonance that kills the efficiency and messes with the gyros.

It was one of the first, or maybe the first PCB quadrotor with brushless motor drivers. But the Toshiba TB6588FG drivers are limited in what they can do, as is the Arduino Pro Mini that runs the flight control. Basically, it's time for a v2 that leverages some new technology and also improves the mechanical design - maybe going to 5" props as well. We'll see...

And unfortunately, because of the rain and crowds, I didn't get to do any aerial video with my new Talon copter. But it looks good and works quite nicely, for a ~$300 all-up build. (Not including the GoPro.) Here's some video I shot with my dad in North Carolina that I had queued up to show people at the Faire.

Talon v2, son of Kramnikopter.
Electric linkage drive scooter my Kickstarter plz.
The last thing I brought for the Faire was my Grasshopper 3 camera setup with the custom recording software I've been working on for the Microsoft Surface. With this and the Edgerton Center's new MōVI M5, I got to do a bit of high speed go-kart filming and other Maker Faire documentation. The videos above were all created with this setup.

I had a stand, but this seemed easier at the time...
As a mobile, stabilized, medium-speed camera (150fps @ 1080p), it really works quite nicely. I know the iPhone 6+ now has slow-mo and O.I.S., but it's way more fun to play with gimbals and raw 350MB/s HD over USB3. Of course it meant I had 200GB of raw video to go through by the end of the Faire. I did all of the video editing in Lightworks using a JPEG timeline. (Each frame is a JPEG in a folder...somehow it manages to handle this without rendering.)

And that's pretty much it. It was much like other Maker Faires I've been to: lots of people asking questions with varying degrees of incisiveness ("Is that a drone?"), crazy old guys who come out right before the Faire ends to talk to you about their invention, and little kids trying to ride or touch things that they shouldn't be trying to ride or touch while their parents encourage them. Although I did get one or two very insightful kids who came by on their own and asked the most relevant questions, which gives me hope for the future. It was great to return to Cambridge and see everyone's cool new projects as well.

My MIT Maker Faire 2014 fleet by the numbers:
Projects: 5
Total Weight: ~75lbs
Total Number of Wheels: 6 (not including omniwheel rollers)
Total Number of Props: 8
Total Number of Motors: 18 (not including servos), 1 of my own design
Total Number of Motor Controllers: 18 (duh...), 16 of my own design!
Total Number of Cameras: 2

And here are some more pictures from the Faire:

I'm not sure what this is.
Dane's segboard, Flying Nimbus, which I got to ride. It actually has recycle Segstick parts!
Good old MITERS, where you can't tell where the shelves end and the floor begins!
Ed Moviing. I finally figured out a good way to power wireless HD transmitters...can you see?
A small portion of the EVs, lining up for a picture or a race or something.
Of course there were Tesla coils.
Flying out of Boston after the Faire, got a great Sunday morning view.

Sunday, September 7, 2014

Grasshopper3: Circular Buffer, Post Triggering, and Continuous Modes

Previously I have implemented a bare-bones RAM buffering program for the Grasshopper3 USB3 camera. The idea was to strip out all operations other than transferring raw image data from the USB3 port to RAM, so that the full frame rate of the camera can be buffered even on a Microsoft Surface2 tablet. While the RAM buffer is filling, no image conversion or saving to disk operations are going on. The GUI is running on another thread, and the image preview rate is held to 30Hz.

One-shot linear buffer with pre-trigger: 1) After triggering, frames are transferred into RAM (yellow). 2) RAM buffer is full. 3) Images are converted and saved to disk (not in real-time).
At the time I also tried to implement a circular buffer, where the oldest images are continuously overwritten in RAM. This allows for post-triggering, a common feature of high-speed cameras. The motivation for post-triggering is that the buffer is short (order of seconds) but you don't know when the exciting thing is going to happen. So you capture image data at full frame rate, continuously overwriting the oldest images in the buffer, until something exciting happens. The trigger can come after the action, stopping the image data capture and locking the previous N image frames in the buffer. The entire buffer can then be saved starting from the oldest image and ending at the trigger.

Circular buffer with post-trigger: 1) Buffer begins filling with frames. 2) After full, the oldest frame is overwritten with the newest. 3) A post-trigger stops buffering new frames and starts saving the previous N frames to disk.
It didn't work the first time I tried it; the frame rate would drop after the first time through the buffer. But I did a little code cleanup - now the first time through it constructs the array elements that make up the frame buffer, but on subsequent passes it assumes the structures are already in place and just transfers in data. This makes a wonderful flat-top trapezoidal wave of RAM usage that corresponds exactly with the allocated buffer size:

Post-triggering is not the only thing a circular buffer structure is good for. It can also be used as the basis for robust (buffered) continuous saving to disk. Assuming a fast enough write speed, frames can be continuously taken out of the buffer on a First-In First-Out (FIFO) basis and written to disk. I say "disk" but in order to get fast enough write speeds it really does need to be a solid-state drive. And even then it is a challenge.

For one, the sequential write speed of even the fastest SSDs struggles to keep up with USB3. To achieve maximum frame rate, the saved images must be in a raw format, both to keep the size down (one color per pixel, not de-Bayered) and to avoid having the processor bottleneck the entire process during image conversion. Luckily there is an option to just spit out raw Bayer data in 8- or 12-bit-per-pixel formats. IrfanView (my favorite image viewer) has a plug-in that is capable of parsing and color-processing raw images. The plug-in also works with the batch conversion portion of IrfanView, so you can convert an entire folder of raw frames.

The other challenge is that the operations required to save to disk take up processor time. In the FlyCap2 software that comes with the camera, the image capture loop has no trouble running at full frame rate, but turning on the saving operation causes the frame processing rate to drop on my laptop and especially on the MS Surface 2. To try to combat this problem, I did something I've never done before: actually intentionally write a multi-threaded application the right way. The image capture loop runs on one thread while the save loop runs on a separate thread. (And the GUI runs on an entirely different thread...) This way, a slow-down on the saving thread ideally doesn't cause a frame rate drop on the capture thread. The FIFO might fill up a little, but it can catch up later.

Continuous saving: Images are put into the RAM buffer on one thread (yellow) and removed from it to write to disk on another thread (green). This happens simultaneously and continuously as long as the disk write can keep up.
There's another interesting twist to the continuous-saving circular buffer: the frame rate in doesn't necessarily have to equal the frame rate out. For example, it's possible to buffer into RAM at 150fps but save only every 5th frame, for 30fps output. Then, if something exciting happens, the outgoing rate can be switched to 150fps temporarily to capture the high-speed action. If the write-to-disk thread can't keep up, the FIFO grows in size. As long as the outgoing rate is switched back to 30fps before the buffer is full, the excess FIFO elements can be unloaded later.

The key parameter for this continuous saving mode is the number of frames of delay between the incoming and the outgoing thread. The target delay could be zero, but then you would have to know in advance if you want to switch to high-speed saving. Setting the target delay to some number part-way through the buffer allows for post-triggering of the high-speed saving period, which seems more useful. I added a buffer graphic to the GUI to show both the target and the actual saving delay during continuous saving. My mind still has trouble thinking about when to turn the frame rate divider on and off, but I think it could be useful in some cases.

Here's some video I took to try out these new modes. It's all captured on the Microsoft Surface 2, so no fancy hardware required and it's all still very portable.

This is a simple test of the circular buffer at 1080p150 with post-trigger. The coin in particular is a good example of when it's nice to just leave the buffer running and post-trigger after a lucky spin gets the coin to land in frame.

More coin spinning, but this time using the continuous saving mode. Frames go into RAM at 150fps, but normally they are only written to disk at 30fps. When something interesting happens (such as the coin actually being in frame...), a burst of 150fps writing to disk is triggered. On the Surface 2, the write thread is slower than the read thread, so it can only proceed for a little while until the FIFO gets too full. Switching back to 30fps saving allows the FIFO to catch up.

Finally. a quick test of lower resolution / higher frame rate operation. At 480p, the frame rate can get up to 360+ fps. Buffering is fine at this frame rate (the overall data rate is actually lower). It actually doesn't require an insane amount of light either - the iPhone display is the only source of light here. You can see its IR proximity sensor LED flashing, as well as individual frame transitions on the display, behind the water stream. The maximum frame rate goes all the way up to 1100+ fps at 120p, sometime I have yet to try out.

That's it for now. The program (which started out as the FlyCapture2SimpleGUI source that comes with the camera) has a nice VC# GUI:

I can't distribute the source since it's derived from the proprietary SDK, but now you know it's possible and relatively easy to get it to capture and save efficiently with a bit of good programming. It was a fun project since I've never intentionally written interacting multi-threaded programs, other than maybe separating GUI threads from other things. I guess I'm only ten years or so behind on my application programming skills now...