Saturday, October 9, 2021

Zynq Ultrascale+ Bare Metal NVMe: 2GB/s with FatFs + exFAT

This is a quick follow-up to my original post on speed testing bare metal NVMe with the Zynq Ultrascale+ AXI-PCIe bridge. There, I demonstrated a lightweight NVMe driver running natively on one Cortex-A53 core of the ZU+ PS that could comfortably achieve >1GB/s write speeds to a suitable M.2 NVMe SSD, such as the Samsung 970 Evo Plus. That's without any hardware acceleration: the NVMe queues are maintained in external DDR4 RAM attached to the PS, by software running on the A53.

I was actually able to get to much higher write speeds, over 2.5GB/s, writing directly to the SSD (no file system) with block sizes of 64KiB or larger. But this only lasts as long as the SLC cache: Modern consumer SSDs use either TLC or QLC NAND flash, which stores three or four bits per cell. But it's slower to write than single-bit SLC, so drives allocate some of their free space as an SLC buffer to achieve higher peak write speeds. Once the SLC cache runs out, the drive drops down to a lower sustained write speed.

It's not easy to find good benchmarks for sustained sequential writing. The best I've seen are from Tom's Hardware and AnandTech, but only as curated data sets in specific reviews, not as a global data set. For example, this Tom's Hardware review of the Sabrent Rocket 4 Plus 4TB has good sustained sequential write data for competing drives. And, this AnandTech review of the Samsung 980 Pro has some more good data for fast drives under the Cache Size Effects test. My own testing with some of these drives, using ZU+ bare metal NVMe, has largely aligned with these benchmarks.

The unfortunate trend is that, while peak write speeds have increased dramatically in the last few years, sustained sequential write speeds may have actually gotten worse. This trend can be seen globally as well as within specific lines. (It might even be true within different date codes of the same drive.) Take for example the Samsung 970 Pro, an MLC (two bit per cell) drive released in 2018 that had no SLC cache but could write its full capacity (1TB) MLC at over 2.5GB/s. Its successor, the 980 Pro, has much higher peak SLC cache write speeds, nearing 5GB/s with PCIe Gen4, but dips down to below 1.5GB/s at some points after the SLC cache runs out.

Things get more complicated when considering the allocation state of the SSD. The sustained write benchmarks are usually taken after the entire SSD has been deallocated, via a secure erase or whole-drive TRIM. This restores the SLC cache and resets garbage collection to some initial state. If instead the drive is left "full" and old blocks are overwritten, the SLC cache is not recovered. However, this may also result in faster and more steady sustained sequential writing, as it prevents the undershoot that happens when the SLC cache runs out and must be unloaded into TLC.

So in certain conditions and with the right SSD, it's just possible to get to sustained sequential write speeds of 2GB/s with raw disk access. But, what about with a file system? I originally tested FatFs with the drive formatted as FAT32, reasoning (incorrectly) that an older file system would be simpler and have less overhead. But as it turns out, exFAT is a much better choice for fast sustained sequential writing.

The most important difference is how FAT32 and exFAT check for and update cluster allocation. Clusters are the unit of memory allocated for file storage - all files take up an integer number of clusters on the disk. The clusters don't have to be sequential, though, so the File Allocation Table (FAT) contains chained lists of clusters representing a file. For sequentially-written files, this list is contiguous. But the FAT allows for clusters to be chained together in any order for non-contiguous files. Each 32b entry in the FAT is just a pointer to the next cluster in the file.

FAT32 cluster allocation entirely based on 32b FAT entries.

In FAT32, the cluster entries are mandatory and a sequential write must check and update them as it progresses. This means that for every cluster written (64KiB in maxed-out FAT32), 32b of read and write overhead is added. In FatFs, this gets buffered until a full LBA (512B) of FAT update is ready, but when this happens there's a big penalty for stopping the flow of sequential writing to check and update the FAT.

In exFAT, the cluster entries in the FAT are optional. Cluster allocation is handled by a bitmap, with one bit representing each cluster (0 = free, 1 = allocated). For a sequential file, this is all that's needed. Only non-contiguous files need to use the 32b cluster entries to create a chain in the FAT. As a result, sequential writing overhead is greatly reduced, since the allocation updates happen 32x less frequently.

exFAT cluster allocation using bitmap only for sequential files.

The cluster size in exFAT is also not limited to 64KiB. Using larger clusters further reduces the allocation update frequency, at the expense of more dead space between files. If the plan is to write multi-GB files anyway, having 1MiB clusters really isn't a problem. And speaking of multi-GB files, exFAT doesn't have the 4GiB file size limit that FAT32 has, so the file creation overhead can also be reduced. This does put more data "at risk" if a power failure occurs before the file is closed. (Most of the data would probably still be in flash, but it would need to be recovered manually.)

All together, these features reduce the overhead of exFAT to be almost negligible:

With 1MiB clusters and 16GiB files, it's possible to get ~2GB/s of sustained sequential file writing onto a 980 Pro for its entire 2TB capacity. I think this is probably the fastest implementation of FatFs in existence right now. The data block size still needs to be at least 64KiB, to keep the driver overhead low. But if a reasonable amount of streaming data can be buffered in RAM, this isn't too much of a constraint. And of course you do have to keep the SSD cool.

I've updated the bare metal NVMe test project to Vivado/Vitis 2021.1 here. It would still require some effort to port to a different board, and I still make no claims about the suitability of this driver for any real purposes. But if you need to write massive amounts of data and don't want to mess around in Linux (or want to try something similar in Linux user space...) it might be a good reference.

Sunday, September 12, 2021

TinyCross: New UI and Front Wheel Traction Control

 In the last post, I finally did some actual data logging with TinyCross set up in 4WD, 80A peak per motor, which is the rated current. Based on tinyKart, I know they can handle a a bit more for short durations, maybe even up to 120A. But the data logs (and many instances of having rocks flung into my face) demonstrate that the front wheels reach their traction limit somewhere around 60A on asphalt.

The behavior of front wheel slip on a go-kart is something new to me. In a straight line, the initiation of the slip and the acceleration of the wheel actually isn't the biggest problem. It's when the wheel regains traction and slows down that bad things happen. The restored grip combines with the energy being dumped from the wheel's moment of inertia to generate a quick pulse of torque on that side, which creates a lot of torque steer.

To deal with this, I wanted to implement some form of traction control, at least for the front wheels, so that I could get the most torque out of them as possible without the steering disturbances and rock shooting. But first, I needed a way to easily configure both the motor currents and the traction control settings without having to drag around my laptop everywhere. So, I finally built out the steering wheel UI to include a bunch of settings:

Sorry for the exposure; it's the only way to capture the full OLED refresh period.

Anyone familiar with the MōVI Controller might recognize the OLED display. I chose this for daylight visibility and responsiveness (~50Hz update rate). The menu interface is essentially the same as the one I built the day before NAB 2014... The left knob scrolls through the menu. The right knob adjust settings and, by clicking or holding, performs actions.

In the four corners are three motor parameters for the corresponding motors: S for Status, which shows error codes. F for Forward peak current, and R for Reverse (braking, or actually reversing) peak current. Setting both to zero masks out the CAN command from that motor, triggering a timeout that turns off the gate drivers entirely. A click and hold on S triggers an encoder recalibration for that motor.

In the second column from the left, the first three settings relate to data logging: LS for Logger Status, FN for File Number (click to start a new file), and LT for Logger Time, the time in [ms] for a single row of the data log to be written. Then, there are two parameters for tuning traction control: TT for Traction Threshold, and TG for Traction Gain, which I will explain shortly.

The reason I wanted to be able to adjust peak currents from the steering wheel is because I agree with this early Tesla blog post: "'s much safer to avoid wheelspin altogether than react to it." If I know the surface supports front wheel current around 60A, there's not much point in setting it higher than that. But, I want to be able to set it higher for testing, or adjust it for different surfaces.

As for the traction control itself, there are a lot of corner cases to think about in 4WD, but the main problem I'm trying to solve is front wheel slip. If I assume the rear wheels are not slipping, then I can use their average speed as a reference. From there, it's easy to see if a front wheels is running faster than that reference, and reduce the current to that motor if so. This only needs two settings: a Traction Threshold (TT) that sets how much wheel slip is allowed, and a Traction Gain (TG) that sets how much to reduce the current per unit slip above the threshold. The Traction Threshold prevents overactuation in normal conditions and allows for speed differential due to turning radius.

But what happens if a rear wheel does slip? Well, then the front wheel might slip too. At that point, I'm probably in some kind of a four wheel sideways drift anyway, so alternate control laws are going to apply. Being able to trigger some rear wheel slip with the throttle is part of the fun, too, so having complete 4WD traction control isn't something I necessarily need to solve.

With the new UI setup and the simple front wheel traction control in place, it was time to do some tuning...

...or not.

At first, everything seemed to be going okay. I did a couple of runs at 60A front current and 80A rear current and the traction control seemed to be working as intended. But then during light regenerative braking at around 30mph, I heard the all-too-familiar sound of a FET popping, followed by some more bad noises and smells from the front drive. Upon inspection, only two FETs actually died, but they also took out many of the power traces, meaning this board was trash.

So what happened? Well, unfortunately, the data log was not very helpful in this case. It did show the speed (30mph) and current command (around -10A), but nothing out of the ordinary up until the point of failure. There is only one data point showing a Q-Axis current of 286A on the front left motor, followed by an undervoltage fault, which might have been the battery sagging or the power input traces getting blown up. So whatever happened, happened quick.

It's been a while since I've actually destroyed a motor controller, so I was a little disappointed. But after some thought, I didn't think this was due to the new traction control stuff. That's only applied during acceleration, and this failure definitely happened under braking. I think it's more likely that the front left motor just lost sync and the back EMF at 30mph was high enough to do damage. Up until now, I have only had a relatively slow overcurrent limit of 160A (or more) for 10ms. These FETs have a pretty insane Safe Operating Area (SOA), but that limit does leave room for exceeding it with currents above 400A:

This system could easily generate a 400A transient if a motor loses sync at 30mph. And the motor position and speed data does cut out at the same data point as the failure. But that's not enough to determine cause and effect. So for now I can only make changes that might help and hope for the best. I added in several more stages of faster overcurrent protection, up to 300A for a single ADC/PWM cycle (42.7μs). These overlap enough to cover the entire R_DS(on)-limited boundary of the SOA (up to the pulse rating of 1450A for 100μs!).

A faster overcurrent trip doesn't help with whatever caused the motor to lose sync in the first place (if that is what happened). I have seen at least a couple previous instances where the encoders, which supply emulated Hall effect sensor signals, have behaved as if they were completely reset. Even though I only use the buffered and optically isolated virtual Hall effect sensor signals for commutation, I was still reading the SPI data anyway. Maybe a SPI read got corrupted by noise and turned into a write that either reconfigured or entirely reset the encoder mid-run? To protect against this, I now disabled the SPI transactions entirely other than during initialization and calibration.

So with these changes and my last and only spare drive, I went back out for another try. This time, I ran into no motor drive issues and was actually able to test and tune the front wheel traction control as I originally intended. The difference is immediately obvious while driving and in the data. First, a test at 80A front, 90A rear, with no traction control:

Front wheel traction control off.

As before, the front right wheel starts slipping at about 60A and spins up to 2-3x the actual ground speed. The front right always seems to lose grip first, a mystery to solve another day. When I let off the throttle and it regains traction, the torque pulse creates substantial torque steer, jerking the steering wheel almost 20º to the left, which I then have to counteract immediately to stay on course. Overall, it's impossible to sustain peak acceleration for more than a second or so before having to deal with the wheel spin and torque steer.

And now with the same currents, but front wheel traction control on:

Front wheel traction control on.

The front right (FR) current now averages a bit below 60A and its speed is held to just a small margin above the actual ground speed. It's never able to build up momentum and then "catch", inducing torque steer. This allows continuous acceleration up to and past 30mph. The front left (FL) also starts to slip in the 20-30mph range, but the traction control catches it too. The overall result is a much more controllable launch and far fewer rocks being thrown up by the front wheels.

After finding traction control settings that I liked, I switched back to current settings that more closely match the actual traction limits: 60A front and 100A rear. This still gives a reasonable 0.45g launch, but with less likelihood of triggering the traction control on asphalt. I'd like to push to >0.5g, to match tinyKart's most extreme configuration, but that'll either require 120A on the rear or changing the gear ratio a bit. At 60A / 100A, the front motors still share enough of the load that the rear motors stay at healthy temperature after some acceleration runs:

Rear motors are doing most of the work, but...

...they are at a reasonable temperature.

And finally I did some less structured testing by just driving through the gravel corner in my parking lot and intentionally adding throttle to induce slip. It behaves pretty well, slipping and oversteering about the right amount to be controllable but still fun:

I think at this point most of the handling bottlenecks are back on the mechanical side. There's a small amount of backlash in the steering column that definitely exaggerates the residual torque steer, especially at high speeds. It's almost all coming from the U-joint, which I may try to shim or replace with one with tighter tolerances. Other than that, I need to do some suspension geometry tweaking to improve handling of lateral transients. Speaking of which, here's one last data capture. See if you can figure out what's going on here...

Mystery data log.

Sunday, August 15, 2021

TinyCross: 4WD 80A Data Logging

It's been a long time since I did a proper test drive with TinyCross, although I've taken it out just for fun a few times. Since I completed the weight/width reduction pass last week, I wanted to get it out again and do some proper data logging in 4WD, with the peak current set to 80A for all four motors. This is still below the ultimate target of 100-120A (for short bursts), but plenty for parking lot testing.

Really enjoying the extra 2" of clearance - I can get through most of the "doors" in my building now.

I had to inflate the tires, but amazingly the air shocks don't seem to have leaked at all after a year of neglect. And they still do a pretty impressive job of soaking up the awful topography of my parking lot.

I wanted to do some more thorough data logging in 4WD to characterize some of the issues I've felt while just driving around for fun. The steering wheel PCB collects data from the front and rear motor drives over CAN, appends some of its own data, and writes the whole thing to a microSD card. When I first set this up, I just had it overwrite the existing data log every power cycle. But in the couple of years since I set that up, I've had to master FatFs. So setting it up to create new files on the fly without messing up any of the real-time stuff was an easy upgrade.

Here's what a 4x80A launch looks like:

4x80A launch (attempt).

The main problem is pretty obvious from the data: the front wheels just don't have enough weight on them to support 80A. If there's even a little bit of a loose surface, one or both front wheels will lose grip. Excessive wheel slip is inefficient, so the peak acceleration isn't as high as it could be if all four wheels hugged their grip limit. But front wheel slip is especially bad because it results in massive torque steer. (I actually used this to make remote-control TinyCross.) It also has a habit of throwing rocks up into the driver's face.

I've even debated whether the front wheel drive on TinyCross is worth the extra weight and complexity. tinyKart handled pretty well with RWD only: I could put in a controlled amount of oversteer with the throttle. In fact, I got a chance to test out how TinyCross feels with RWD only when I had - let's call it an 80/20 failure - on the front right upright:

Always check your T-nuts! The only real casualty was the encoder wire.

Although I was able to fix the mechanicals with the single hex driver I always bring with me, a few crimps pulled out of the encoder wire and I didn't have the tools to fix it. I could probably add a failover to sensorless operation for individual motors, but I'm not sure how well it'd work on the front motors, again because of torque steer. (Both fronts would have to agree to not produce torque until the flux estimator converges on the sensorless motor.) For now, I just removed power from the front drive.

In terms of handling, RWD works fine. But the launch is a mere 0.25g at 2x80A. There's no slip, and even if there was, it wouldn't matter as much on the rear since it doesn't induce torque steer.

2x80A launch.

Even at 120A, this would only be about a 0.4g launch. tinyKart, in its last and somewhat scary configuration, was hitting about 0.5-0.6g. Part of this is down to gearing: TinyCross, with 12.5" wheel, has to be geared for higher speeds. I could always ditch the front motors and switch to 80mm motors with more torque on the rear. But I think that goes against the spirit of TinyCross. Having full independent suspension and 4WD has always been the point.

So I think I'll finally have to dive in to writing some simple traction/launch control software. Just looking at the 4x80A launch data, it's easy to pick out the wheel that's slipping and imagine that the software could just fold back the current command to that wheel as its speed starts to diverge from the other three. But there are so many logical knots on the path to generalizing that to 4WD, where any subset of the four wheels could be slipping, that it makes my brain hurt to even think about.

There are some amazing technical blog posts from the early days of Tesla (back when it was more of an engineering project than a consumer electronics device) where they talk about how it took months to go from a controller with excellent high-bandwidth torque control to functioning traction control, and even then a lot of it was subjective. One observation I really liked:

This type of feedforward traction control can be hugely beneficial; for instance, it's much safer to avoid wheelspin altogether than react to it.

This was regarding a lateral G observer that was fed into the friction model that the traction control software used to help limit motor torque to what it thought the tires could reasonably handle. This way, wheel slip might be limited to cases where there truly is a sudden drop in friction at one wheel. I think that should be the goal for this as well. I might even be able to just do slip detection on the front wheels. It'll be an interesting experiment, at least.

Saturday, August 7, 2021

TinyCross Weight and Width Reduction Pass

It's summer, which means it's time to work on go-karts. This round, it's a modification to TinyCross that I've been wanting to make ever since I first got it together about two years ago. The main issue is that I designed it around stock rear 12.5" scooter wheels. These are almost symmetric and have threading on both sides of the hub that are meant for mounting the drive sprocket and brake disk. But - and this is maybe my favorite bit of packaging on this project - I've got the brake and drive sprocket both mounted to the inboard side, with the brake caliper sitting right in the middle of the belt:

The brake and drive sprocket are both mounted to the inboard side of the wheel, making the outboard side of the hub dead weight.

This makes the extended length of the outboard side of the hub useless. But, I left it as stock for simplicity. I figured if I ever needed to replace the wheels, it would be easier to drop in a new stock 12.5" wheel. But, this drives the overall width of the kart up to about 35" for no good reason:

The total width, about 35", is driven in part by the symmetric 12.5" wheel hubs.

It's also unnecessary weight, especially factoring in the beefier 5"x5/8" hex standoffs I used to close the structural loop around each wheel. I figured I could eliminate 2" off the total width and about 1lb off the total weight if I just bit the bullet and re-machined the 12.5" wheel hubs. It still wouldn't fit through a 32" door frame, but it would be easier to wiggle through indoor spaces and fit in my car. It also would just look a lot nicer.

One of the reasons I put off this modification for so long is because I thought it would involve disassembling the entire wheel module, but it turns out that it's just barely possible to remove the wheel without removing the motor. I can take off the brake caliper and slip the belt off the pulley to give it just enough slack to pull the wheel off the spindle shaft. I don't remember intentionally designing it this way, but let's pretend I did. It'll be good for fixing flats, too. 

The next obstacle to overcome was removing the outboard bearings. I didn't have a bearing puller on-hand, but I discovered that an 80/20 T-Nut (which I obviously have hundreds of...) is just about exactly the right size to push on the outer race of these bearings. So I came up with this improvised tool:

Improvised bearing pusher.

The tool is built inside the hub by slipping the 80/20 T-Nut through the bearing, flipping it horizontal, then dropping in the hex standoff from the other side. After fastening it together with a 1/4-20, it's ready for the press. Luckily, I didn't Loctite these bearings in, so they pressed out pretty easily.

Pressing out the bearings using the makeshift pusher.

The 12.5" wheels don't fit on my mini lathe, but they do just barely fit on my mini-mill. I knew this ahead of time, so I bought a 22mm end mill specifically for cutting the new bearing pocket. (One of the nice features of this mini-mill is its use of a regular R8 spindle, so it's possible to get large tools for it.) I did have to get a little creative with fixturing. The brake disk is bolted down to a piece of 80/20, which is clamped in the mill. But, to make things stiff enough, I also had to ground the rim itself directly to the bed with some long clamping screws.

Clamping situation: not great, not terrible.

Pretty sure this mill was never meant to hold a tool this big.

I decided to extend the bearing pocket by 1.000" first, before machining down the hub by 1.000". I'm not sure if this was the best order of operations, but it all went pretty smoothly. Here's 7:45 of relaxing slow-motion bearing pocket cutting, captured at 4K 420fps with my Wave:

These hubs are cast aluminum, so it wasn't surprising to find that there were some voids in the newly-machined faces. They're nothing that I think would affect the structural integrity, but it's an interesting consequence of the manufacturing process.

Casting voids exposed by re-machining the hubs.

One of the downsides of doing this operation on the mill is that I didn't have a choice of machining the new bearing pocket to an interference fit. But I was pleased to see that, with all the extra effort put into stiffening the fixture, it was still a nice slip fit. I can always add Loctite later if needed.

After re-machining, the bearings are now a nice slip fit.

That just leaves the 7075 spindle shafts, which also needed to be shortened by 1.000". Cutting off the extra length and extending the outboard mounting hole was a quick task for the mini-lathe. Then, it just needed to be re-tapped.

Shortening the 7075 spindle shafts...

...and re-tapping.

Finally, I put everything back together, substituting much lighter 4"x1/2" hex standoffs to span the gap at the top of each wheel module. The total process took only about two hours per wheel, including disassembly and reassembly. So something I have put off for two years was really only one day of work...typical. Anyway, the final result is a kart that's now 2" narrower and about 1lb lighter.

The pile at the front is roughly the weight saved. (5"x5/8" standoffs were replaced by 4"x1/2", but an equivalent amount of weight was taken out of each hub.)

I have a few more tasks I want to do on this kart. It still needs to be fully weather-proofed. I have a plan for enclosing the motor drives, but need to figure out something for the steering wheel PCB. I may redesign that board from scratch since I don't think I'll ever get to using the battery balancing circuit on it. It can be much smaller and simpler without that. Lastly, there's always motor drive stuff to fiddle with to squeeze out more torque and/or speed.

For now, though, I'm glad it's a little lighter and a lot narrower. It'll make deploy that much easier, which ultimately means more actual testing and use.