Sunday, December 28, 2014

Fun with Pixel Shaders

One of the things that saved my ass for MIT Mini Maker Faire was SlimDX, a modern (.NET 4.0) replacement for Microsoft's obsolete Managed DirectX framework. I was only using it to replace the managed DirectInput wrapper I had for interfacing to USB HID joysticks for controlling Twitch and 4pcb. For that it was almost a drop-in replacement. But the SlimDX framework also allows for accessing almost all of DirectX from managed .NET code.

I've never really messed with DirectX, even though at one point long ago I wanted to program video games and 3D stuff. The setup always seemed daunting. But with a managed .NET wrapper for it, I decided to give it a go. This time I'm not using it to make video games, though. I'm using it to access GPU horsepower for simple image processing.

The task is as follows:

The most efficient way to capture and save images from my USB3.0 camera is as raw images. The file is a binary stream of 8-bit or 12-bit pixel brightnesses straight from the Bayer-masked sensor. If one were to convert these raw pixel values to a grayscale image, it would look like this:

Zoom in to see the checkerboard Bayer pattern...it's lost a bit in translation to a grayscale JPEG, but you can still see it on the car and in the sky.


The Bayer filter encodes color not on a per-pixel basis, but in the average brightness of nearby pixels dedicated to each color (red, green, and blue). This always seemed like cheating to me; to arrive at a full color, full resolution image, 200% more information is generated by interpolation than was originally captured by the sensor. But the eye is more sensitive to brightness than to color, so it's a way to sense and encode the information more efficiently.

Anyway, deriving the full color image from the raw Bayer-masked data is a bit of a computationally-intensive process. In the most serial implementation, it would involve four nested for loops to scan through each pixel looking at the color information from its neighboring pixels in each direction. In pseudo-code:

// Scan all pixels.
for y = 0 to (height - 1)
 for x = 0 to (width - 1)
  
  Reset weighted averages.  

  // Scan a local window of pixels.
  for dx = -N to +N
   for dy = -N to +N
    
    brightness = GetBrightness(x+dx, y+dy)
    Add brightness to weighted averages.
   
   next
  next

  Set colors of output (x, y) by weighted averages.

 next
next

The window size (2N+1)x(2N+1) could be 3x3 or 5x5, depending on the algorithm used. More complex algorithms might also have conditionals or derivatives inside the nested for loop. But the real computational burden comes from serially scanning through x and y. For a 1920x1080 pixel image, that's 2,073,600 iterations. Nest a 5x5 for loop inside of that and you have 51,840,000 times through. This is a disaster for a CPU. (And by disaster, I mean it would take a second or two...)

But the GPU is ideally-suited for this task since it can break up the outermost for loops and put them onto a crapload of miniature parallel processors with nothing better to do. This works because each pixel's color output is independent - it depends only on the raw input image. The little piece of software that handles each pixel's calculations is called a pixel shader, and they're probably the most exciting new software tool I've learned in a long time.

For my very first pixel shader, I've written a simple raw image processor. I know good solutions for this already exist, and previously I would use IrfanView's Formats plug-in to do it. But it's much more fun to build it from scratch. I'm sure I'm doing most of the processing incorrectly or inefficiently, but at least I know what's going on under the hood.

The shader I wrote has two passes. The first pass takes as its input the raw Bayer-masked image and calculates R, G, and B values for each pixel based on the technique presented in this excellent Microsoft Research technical article. It then does simple brightness, color correction, contrast, and saturation adjustment on each pixel. This step is a work-in-progress as I figure out how to define the order of operations and what techniques work best. But the framework is there for any amount of per-pixel color processing. One nice thing is that the pixel shader works natively in single-precision floating point, so there's no need to worry about bit depth of the intermediate data for a small number of processing steps.


The second pass implements an arbitrary 5x5 convolution kernel, which can be used for any number of effects, including sharpening. The reason this is done as a second pass is because it requires the full-color output image of the first pass as its input. It can't be done as part of a single per-pixel operation with only the raw input image. So, the first pass renders its result to a texture (the storage type for a 2D image), and the second pass references this texture for its 5x5 window. The output of the second pass can either be rendered to the screen, or to another texture to be saved as a processed image file, or both.

What a lovely Seattle winter day.
Even though the pixel shader does all of the exciting work, there's still the matter of wrapping the whole thing in a .NET project with SlimDX providing the interface to DirectX. I did this with a simple VB program that has a viewport, folder browser, and some numeric inputs. For my purposes, a folder full of raw images goes together as a video clip. So being able to enumerate the files, scan through them in the viewer, and batch convert them to JPEGs was the goal.

Hrm, looks kinda like all my other GUIs...

As it turns out, the pixel shader itself is more than fast enough for real-time (30fps) processing. The time consuming parts are loading and saving files. Buffering into RAM would help if the only goal was real-time playback, but outputting to JPEGs is never going to be fast. As it is, for a 1920x1200 image on my laptop, the timing is roughly 30ms to load the file, an immeasurably short amount of time to actually run both pixel shader passes, and then 60ms to save the file. To batch convert an entire folder of 1000 raw images to JPEG, including all image processing steps, took 93s (10.75fps), compared to 264s (3.79fps) in IrfanView.

Real-time scrubbing on the MS Surface Pro 2, including file load and image processing, but not including saving JPEGs.

There are probably ways to speed up file loading and saving a bit, but even as-is it's a good tool for setting up JPEG image sequences to go into a video editor. The opportunity to do some image processing in floating point, before compression, is also useful, and it takes some of the load off the video editor's color correction.

I'm mostly excited about the ability to write GPU code in general. It's a programming skill that still feels like a superpower to me. Maybe I'll get back into 3D, or use it for simulation purposes, or vision processing. Modern GPUs have more methods available for using memory that make the cores more useful for general computing, not just graphics. As usual with new tools, I don't really know what I'll use it for, I just know I'll use it.

If you're interested in the VB project and the shader source (both very much works-in-progress), they are here:

VB 2012 Project: RawView_v0_1.zip
Built for .NET 4.0 64-bit. Requires the SlimDX SDK.

HLSL Source: debayercolor.fx

P.S. I chose DirectX because it's what I have the most experience with. (I know somebody will say, "You should use OpenGL.") I'm sure all of this could be done in OpenGL / GLSL as well.