## Tuesday, November 29, 2011

### First instance of sensorless sine commutation!

I won't quite call it sensorless field-oriented control yet, because it's still running open-loop sinusoidal commutation, but I've finally been able to yank the sensor cable:

Here's the proof in data:

 (Cap Kart's fancy data logger has certainly come in handy.)
What you see there is a plot (in time) of the motor RPM as measured directly by the Hall effect sensors (yellow) and indirectly by the open-loop flux estimator (blue). At first, both RPM measurements are running simultaneously and they overlap. At the point where the sensor cable is yanked, the RPM measurement from the sensors freezes and then times out to zero, but the flux estimator RPM continues to read properly. The commutation is being controlled by the flux estimator only and the motor continues to run, even under load.

For this test, the motor is being driven by a 12.5% command, which means the peak-to-peak value of the sine waves is 12.5% of the DC voltage. The no-load speed is therefore about 12.5% of maximum. The loaded speed dips to as low as about 6% during the test. This is a good sign that the flux estimator is capable of working at low speeds where the back EMF is small. I haven't written the start-up ramp yet, but this test tells me that I should be able to exit the start-up mode at less than 10% speed.

The properly-functioning flux estimator did not come easy. As I mentioned last time, I converted all the floating-point math to fixed-point 32-bit integer math to get the estimator to run at 10kHz. This led to several instances of both loss-of-precision and overflow problems. The biggest culprit was a low-pass filter being used as a pseudo-integrator on (V-IR). Here's how it looks in floating-point:

vir_a = 0.9996 * vir_a + 0.0004 * vir_a_temp;

Every 100μs, the variable vir_a is decreased by 0.04% and the same percentage of a new value, vir_a_temp, is added in. Since this happens 10,000 times per second, the overall effect is that of a low-pass filter with a time constant of 0.25s. The simple conversion to fixed-point math would be:

vir_a = vir_a * 9996 / 10000 + vir_a_temp * 4 / 10000;

The constants are moved to the opposite side of the variables so that left-to-right order of operations doesn't yield zero all the time. (9996 / 10000 = 0 in integer math.) But that's not the only issue:

[Start of horrible multi-day debugging sequence:]

If vir_a_temp is less than about 10^6, the minimum tolerable precision (~1%) of the second term is lost. Since the physical unit of vir_a and vir_a_temp is Webers, this means working in μWb or even nWb depending on the expected flux.

But, if vir_a is greater than 10^6, then the first multiplication of the first term, vir_a * 9996, overflows a 32-bit signed integer.

Okay, no problem, let's just change the order of operations in the first term:

vir_a = vir_a / 10000 * 9996 + ...

Problem solved? Actually, no. Think about what happens on the first two iterations with vir_a_temp set to 10^6 and vir_a intially at zero.

(1) vir_a = 0 / 10000 * 9996 + 1000000 * 4 / 10000;

The result is 400.

(2) vir_a = 400 / 10000 * 9996 + 1000000 * 4 / 10000;

The results is...400 again. The first term is zero.You can probably see the problem. vir_a will never increase because the divide by 10000 kills off the small initial values completely. So, you can't divide by 10000 first because it kills the small initial values, and you can't multiply by 9996 first because it overflows the large steady-state values. My solution is so horrible that I'm actually ashamed to post it:

if((vir_a <= 214748) && (vir_a >= -214748))
{ vir_a = vir_a * 9996 / 10000; }
else
{ vir_a = vir_a / 10000 * 9996; }

(Note: 214748 is 2^31 / 10000)

There must be a more elegant way to do this...

[End of horrible multi-day debugging sequence.]

After sorting out that mess, I could finally get to the interesting part, which was actually testing the flux estimator to see if it works. Because I opted to paste the flux estimator directly in parallel with my existing sensor-based control code, the testing was greatly simplified:

At all times, the Hall effect sensors are being read and a rotor electrical angle is interpolated from them. Separately, the flux estimator is running on all three phases. At flux zero-crossings, a "virtual Hall effect sensor" transition is generated. The rotor electrical angle based on flux is interpolated from these virtual Hall effect sensor transitions, using the same method as the real sensors. This is where the two methods intersect, and either electrical angle can be fed into the field-oriented control algorithm (or, in this case, directly into the sine wave generator).

Both electrical angles (from flux and from sensors) are recorded and periodically transmitted to the data logger. Because the commutation frequency is much greater than the transmit frequency, plotting either angle against time would not be very informative. But, plotting them against each other tells the whole story:

The rotor electrical angle derived from Hall effect sensors, which were properly sequenced and timed, is nicely correlated to the rotor angle derived from the flux estimator. The "virtual Hall effect sensors" from the flux estimator do not require sequencing or timing, since they are linked to the correct phase variables already. (No more guessing wire combinations!)

There is a bit of a staircase effect which I have yet to determine a cause for. There are also still a few bugs to track down, including something that is causing the field-oriented control to freak out. And I haven't even begun to write in the start-up and fault detection, but at least progress has finally been made.

## Saturday, November 19, 2011

### Ah, software optimization, my old friend. It's been a while.

In a somewhat drastic attempt to force myself to get work done, I have decided not to ride Pneu Scooter until I implement sensorless field-oriented control on its controller, a 3ph 3.1 board that's been happily commutating the motor off Hall effect sensors for something like 2-3 million electrical cycles. It's pretty much an ideal test platform since the motor is well-characterized (I built it.) and the controller is known to be reliable. It's also a relatively low-frequency system (175Hz at top speed), so the sensorless algorithm need not be absurdly fast to keep up with the commutation frequency. But, I do plan to use this algorithm on faster motors, so I'm designing for higher frequencies anyway.

The highest frequency the controller could run at is the PWM frequency, in this case 15.6kHz. Above this speed, it's not possible to update the commanded voltage to the motor fast enough, since the PWM duty cycle is not latched into a timer until the next PWM cycle. So, even if you could run a sensorless algorithm faster, there would be almost no point.

Back when I was using the MSP430F2274, I ran a "fast loop" at 14.4kHz (PWM frequency) to handle sensor polling, speed estimation, position interpolation, and updating the three phase motor PWMs from a sine look-up table. The "slow loop" ran at 122Hz and handled current sampling, coordinate transformation into the rotating reference frame, and feedback control of the q-axis and d-axis currents.

 This.
On that processor, which doesn't have a hardware multiplier, getting the fast loop to run at 14.4kHz was a major challenge. I spent a large portion of the development time just optimizing the software using a number of tricks to get the processing time down to 53μs (for two motors). The largest single processor burden, accounting for 10μs, was the integer division required to get the speed of the motor, 1/(time per cycle). This was large enough that I couldn't compute both motor speeds in the same fast loop cycle; one always got priority.

 High = processor in use. Low = processor free. The lighter portion of the trace is the single integer division.
Even totally optimized with all integer math (no floating-point), the dual FOC was just barely able to fit in the fast loop at 14.4kHz. The leftover processor time went to the slow loop, which could be run arbitrarily slow thanks to the coordinate transformation to the rotating frame. So, having floating-point controller math in the slow loop has never been an issue.

But now, I've moved on to the STM32F103 32-bit ARM processor, which has a hardware multiplier. Though so far I haven't done anything other than waste the extra processing power on silly things, one of my motivations for giving up my beloved MSP430 was to be able to implement sensorless field-oriented control. But first, for comparison, here's what single-motor FOC code looks like ported to the STM32F103:

The fast loop on this processor runs at 10kHz. It's no longer tied to the PWM frequency, so it's easy to change. I chose 10kHz for simple math. Using the same clock speed as the MSP430, single-motor FOC takes just under 7us. The integer operations, including multiplication, happen in one clock cycle instead of the 50-60 it took on the MSP430. Integer division is also fast, though to the best of my knowledge it isn't single-cycle. And these are 32-bit operands, so they inherently have more precision than the MSP430's 16-bit integers.

What about floating-point? I thought maybe the fast integer hardware would be leveraged somehow to make floating-point operations faster as well, even though the STM32F1-series does not have a hardware floating-point unit. So, I threw on my first attempt at a rotor flux observer, all implemented in floating-point math, to test this.

Mathematical Tangent:
The rotor flux observer, used to estimate the position of the rotor magnets in lieu of Hall effect sensors, is a simple one that I mentioned before:

It's an open-loop rotor flux observer, meaning there is no feedback to correct the estimated flux. It relies on a reasonably accurate estimate of R and L (the motor resistance and inductance, respectively) to produce the flux estimate. I did a little more thinking and decided that this is a good place to start, instead of jumping right into closed-loop flux observers. The nice things about the open-loop rotor flux estimator, in my view, are:
• It's very obvious what it's doing, in the context of the motor electrical model. In my experience, simple things tend to work more reliably.
• It estimates flux, instead of back EMF. The value of flux is speed-independent, so the amplitude of the flux estimate should remain constant. The integrator also filters out noise in the current measurement. No derivative of current is required.
• The effect of parameter offset is easy to analyze. More on this in a later post, but it's easy to show with simple geometrical arguments what the effect of an improperly-set R or L is.
I think in time I will move back toward the closed-loop observer, which can compensate for the parameter offset automatically, but for now this is what I'm starting with. So, the fast-loop code must sample the phase current sensors and the DC voltage. Phase voltage is computed as a duty cycle fraction of the DC voltage, based on the sine look-up table. The integration is implemented as a low-pass filter, to kill off DC offset.

All three phase fluxes are estimated, and used to establish the motor position through what I will call "virtual Hall effect sensors" that detect flux zero-crossings. This method, though completely non-traditional and probably stupid, allows me to tape the flux observers to the back of my existing sensor-based FOC code and get up and running quickly.

Okay back to software:
Implementing the above open-loop flux estimator in all floating-point math was a terrible idea:

It looks like the processor utilization is 99%, but it's actually more like 120%. The 10kHz loop took so long that it actually took 120μs to finish each cycle...so I'm not even sure if it really finished or if the interrupt controller just gave up. The flux observer alone took about 80μs to run. I determined that each floating point multiply was taking about 7.44μs, or close to 120 clock cycles at 16MHz. So clearly the floating point math is still being done in software, and it's not really leveraging the hardware multiplier at all to speed things up.

So began a day or two of modifying the code to run faster without actually changing what it does. Software optimization is probably one of the most thankless tasks ever, since you make something work the same way it did before and nobody can see the difference. But I still find it somewhat fun to try to squeeze every bit of time out of a control loop.

First, I now have the ability to stick all my state variables into 32-bit integers and use the extra precision to make up for the lack of floating point capability. For example, instead of Phase A's current being represented in software as 38.7541723[A], it can be 38754[mA]. I don't care about sub-mA precision, and that still leaves me plenty of multiplication overhead. By that I mean I can still multiply the current, which fits in 17-bits of signed int, by up to 15 bits of precision without overflowing the 32-bit signed int.

For example, the floating-point current scaling would have been:

float raw_current, scaled_current;
scaled current = raw_current * 0.0561;

This scaled the raw ADC value to physical units of Amps. But it has more final precision than is really necessary and can be replaced with all integer math:

signed int raw_current, scaled_current;
scaled_current = raw_current * 561 / 10;

Now the scaled current is an integer value in mA. The intermediate precision required is about 22 bits. (12 bits for the raw ADC value plus 10 bits for the scaling operand 561.) The integer division by 10 is fast, and the precision is high enough that the truncated result is still perfectly fine.

After thus converting all the floating-point operations to integer math, the fast-loop consisting of FOC, ADC sampling, and flux observer was down to 50.5μs at 16Mhz:

This would already be good enough to run, but there are several other processing efficiency tricks I had ready to deploy. One obvious target for efficiency improvement is the ADC sampling. The STM32F103, like many other microcontrollers, uses a Successive Approximation Register (SAR) ADC, which is sort-of like a guess-and-check process for converting an analog value to digital representation. Each guess takes time, so many cycles are spent waiting for the conversion to complete. As implemented above, the processor would just sit there waiting for the sample to finish.

With the ADC settings I have been using, each sample takes 4.55μs. The fast loop samples four analog channels, taking a total of 18.2μs. For most of that time, the processor is waiting for the ADC to finish. It doesn't need to be, though, since the ADC can run on its own and trigger an interrupt when it's done converting. Implementing this was straightforward: I have the fast loop code start the ADC and then have it cycle through five samples on its own. After each sample, the ADC interrupt retrieves the data and moves on to the next sample. While waiting for the ADC, the processor returns to the main loop.

This showed only marginal improvement. The total processor utilization is still about 50%, because the idle time saved by not waiting for the ADC is offset by extra processing time to decide what to do with the data. The result is that each sample now takes about 8μs, four of which is spent converting and four of which is spent decided what the data is and where to put it. The data manipulating part can be completely eliminated by using yet another hardware feature of this processor, the Direct Memory Access (DMA) peripheral. The ADC can tell the DMA to automatically transfer data to a specified memory location, with no processor involvement. This would completely automate the sampling, and bring the total processing time for the fast loop down to about 28μs.

But, there's also another way to squeeze even more horsepower out of the STM32F103. The clock speed can be multiplied by an on-board Phase-Locked Loop (PLL) to up to 72MHz. (Right now, the 16MHz oscillator sets the clock speed directly.) I've never even bothered to try turning on the PLL, but now seemed like a good time to see what it would do. For some yet-unknown reason, I was only able to multiply my 16MHz oscillator by as much as 3.5 (or rather, divide it by two and then multiply it by seven...don't ask). That gives me 56MHz. I'm not sure why it won't go higher than that, but I suspect some hidden clock speed limit on a peripheral. But I tracked down all the obvious ones, and none were overclocked. Anyway, here's what the fast loop processor utilization looks like at 56MHz:

The entire FOC and flux estimator now take only 8μs. The ADC samples still take about 4μs, but the amount of that time spent processing data is greatly reduced. (The sampling time itself is limited by the ADC's peripheral clock speed limit of 14MHz, but the data manipulating time is based on the system clock.) The total processor utilization is now about 20%, leaving room for increasing the fast-loop rate or doing more processing for a closed-loop observer.

For now, though, I'm satisfied that the flux observer will run happily at 10kHz and I'm merging it with the FOC code I already have. Hopefully I will get to test it before Thanksgiving.

## Friday, November 4, 2011

### Hey Shane, why haven't you been working on your motor controllers?

I haven't done much on any of my motor controller projects in a while. For example, DirectDrive hasn't been touched since the July update where it passed a 2.4kVA bench test. And I have yet to do anything other than theorize about sensorless FOC code. And that was so long ago that I would have to reread the post to remember what I was talking about. But now that the season of building and testing vehicles is winding down, and I have sworn off doing any demos, faires, expos, exhibitions, presentations, or showcases until the spring, I can actually have time to get back to motor controllers. For real this time.

First of all, even though I've been relatively pleased with the Kelly KBS36101 controllers on tinyKart, they are frustrating to work with. With the external Hall effect sensors perfectly positioned, the performance at 80A per motor is spectacular. But, if the sensors go out of timing even by just a little bit, the controllers cut out and you lose power to one or both sides. Also, they won't work at all at full current (100A) and they have issues at full speed. (The baseline KBS is only rated to 40,000erpm, which is about 75% of this motor's top speed.) Recently, with the changeover to SK3 motors, the controller bugs have just gotten buggier. So...

 I pulled out its heart...
 ...and put it in a box.
If I'm gonna deal with buggy controllers, they might as well be my buggy controllers. DirectDrive was pretty much designed for tinyKart. If I were as hardcore as I was back in 2008 when we built Cap Kart, I would have put it on from the start. But I guess building the entire chassis was enough of a challenge and we did get quite a lot of use out of the Kelly controllers. Still, screw it, time for the complete power system overhaul.

DirectDrive has yet to be tested under any real load, so by committing it to a vehicle, I am forced to solve any to-be-revealed bugs. I guess that means I should also buy more DirectFETs and solder paste, since things rarely go well on the first version of a motor controller. I've only built one so far, but I have enough parts for a second. The payoff will be:
• More power. DirectDrive is sized for 200A peak at 48V. In this application, assuming it doesn't just outright blow up (it will), it should have no trouble pushing 100A peak at 40V.
• Sinusoidal field-oriented control. I'm not sure this is directly advantageous on such high-speed, low-inductance motors. But the side effect of having complete control of motor timing in software is worth it. No more screwing around with sensor positioning.
• Wireless data. I can get vehicle performance data off these controllers, and use the data to help debug failures. (As opposed to the Kelly controllers, where ***  ** is the universal indicator for every possible failure...)
• Wireless throttle? It worked for Cap Kart.
• Shedding weight. DirectDrive is a tiny bit lighter than the KBS.
For all that, the only cost is probably dozens of hours of troubleshooting...

You might also notice the new battery pack. I'm tired of worrying about LiPos and am willing to take a ~20% energy capacity cut for the peace-of-mind that comes with A123 LiFePO4 cells. But in order to match the power density of the LiPos, they will have to be m1-B cells. (The green ones, not the paper ones from that DeWalt Drill tree I found.) These have a lower internal resistance, such that a 6lb, 12S3P pack can put out bursts of 6-8kW.

But yeah, messing with tinyKart's Hall effect sensors has reminded me just how much of a pain in the ass it is. It would be really, really nice to never have to play the phase-and-sensor-musical-chairs-wiring-game ever again. I am still very convinced that vehicle-grade sensorless control is a possibility. In fact, we recently found proof:

That is a \$28 shady eBay eBike controller, similar to the ones that bailed me out in Singapore. Except, it's sensorless. And the start-up doesn't suck. Unlike RC plane controllers, it ramps the output frequency and voltage gradually at start-up so that you get a smooth acceleration. It also does current (torque) control in both start-up and running modes. It's quite nice on Pneu Scooter and RazEr Revolution. But it's still dumb square wave drive, and it simply refuses to start ultra-low-resistance outrunners. However, it's proof that a smooth ramping start-up is achievable.

So, that will be my first task on the way to full sensorless field-oriented control. Start-up is regarded as the hard part of sensorless control, but for some reason I think it's easy. I will be aiming for a sinusoidal drive current with a ramping amplitude and frequency determined by some estimate of the system inertia. It may need locked-rotor detection or a way to reset and try again if it fails to achieve commutation. But I think it will be pretty easy, actually, compared to the rest of the project:

Once the speed is high enough (10%?), the fun begins. Unlike the \$28 version, I will be going for full sinusoidal field-oriented control, with no direct measurement of back EMF. I'm sure this will keep me occupied in software for quite a while. It is my first big software project in a long time...and sadly I'm kind-of excited for it. So much so that I violated one of my long-standing software principles:

 I split my build into multiple files. :/
I'll be implementing sensorless field-oriented control first on the 3ph 3.1, Pneu Scooter's controller. It's been totally fine ever since I replaced the FETs that died in Singapore. I'm still not 100% sure what the cause of that failure was, but since I've never seen it before or since, I will assume it was something specific (like bad sensor timing....) or just high ambient temperature and humidity. Once sensorless Pneu Scooter is up and running, I can think about porting it over to DirectDrive.

Let the season of motor controllers begin...