Archive for August, 2010


Time-of-flight measurement


One of the requirements of the product I’m developing is that I be able to tell the relative distance along a wire of all the nodes in the network.  Because they are sitting on the bus in parallel and are not allowed to cut and internally bridge the wire, I have to use some arcane methods to figure out where they all are.

What I found was a chip from ACAM called the GP2, which is a time-of-flight measurement chip capable of 50 picosecond resolution measurements between start and stop pulses.  Using this, I can tell each node in turn to place a short across the bus (held with power from a cap), and measure how long it takes for a pulse to traverse one half of the wire, the node, and the other half.  By making averaged measurements and sorting the nodes, I can determine their order on the bus.

The first generation board for this included a bunch of jumpers to allow me to reconfigure the various timing-side pins:

It took quite a bit of time to work out how to get the chip to function in the first place, thus the extra resistor and jumper wire around the crystal.  Turns out ACAM put a pretty fragile oscillator on their chips, which requires both a drive-side series resistor (220R) and as well as a parallel resistor (560K) in order to work.  This could be contrasted with a conventional 8-bit microcontroller that generally functions without anything except the crystal at all, but is “cleaner” with load caps.  Sigh.

The jumpers allow me to enable the start and stop inputs, route the fire-pulse generator to start and the line interfaces, gang the two generators together (for 96mA drive current), and configure the “fire-around” mechanism that I don’t entirely understand since I don’t need it…

In an effort to make the design more useful for integration into a protoboard form of the network master, I redesigned it as a 600mil DIP module:

All the same functionality is present, but without all the jumpers.  The 32KHz calibration crystal is missing from this board because Digi-Key had (and still does, my notification to them apparently not having taken effect yet) the ECS-.327-7-38-TR listed as a 4.9×1.8mm crystal, rather than the 7.0×1.5mm it really is…  I have actual 4.9×1.8mm crystals coming on Monday or Tuesday, so I’ll fill that in.  However, since I don’t actually care about absolute accuracy at this point, I really could care less about the calibration clock.  On the bottom is a (unpopulated) site for an MCP1703-3.3 regulator, with pins on the top left for both +5V and +3V3.  All the power and digital I/O is on the top side, while all the measurement-side pins are on the bottom.

Getting the chip actually running was a monumental exercise in frustration, since the datasheet (“registration” required) for the chip is easily the worst-laid-out document of its type I’ve ever run across.  While the designers of the time-of-flight core in the chip were clearly rather talented, whoever did the SPI and registers front-end needs to be shot. Repeatedly.  That being said, I plan on writing up a better organized and more coherent “quick start” along with default registers for the module…  I’ll probably even publish (as-is) the Python GUI I created to tweak all the registers more easily, which operates via BusPirate.

The first board pictured is about to be shipped to a guy in Brazil who managed to find my un-set-up webshop with it listed.  At some point soon I need to try rebuilding the shop with a different backend and actually get it up and running fully, at which point I’ll be offering the DIP module for sale.  Lead time on the module will be somewhat long though, since the one pictured above has the last of the 3 chips I purchased at ~$32 each.  If anybody wants one, I’ll have to order more chips first.  I have 2 unpopulated boards in my possession at the moment, so if you’re interested in one, speak up ;-)


Desktop disaster of the day


Here’s a shot of my desk from yesterday:

The red boards on the right are SparkFun FT232RL adapters.  Bottom right is a Teensy++.  The two long boards are my ATxmega256A3 adapter boards.  The red and blue square boards are SparkFun nrf24l01+ radio boards.  Hanging mid-air is my ATxmega32A4 + nrf24l01+ board, with its debug lines soldered to jumpers that hold it midair.  Below the protoboard is a nrf24l01+ board designed to mount to a Teensy, for USB bridging.  Just above the USB key on the left is an edge view of SparkFun’s nrf24LU1 board that’s intended to supercede the Teensy-based unit. The little round board at the bottom is a compass-based servo board for a borehole geophone.  To the left above the red Sharpie is the latest version of my ACAM GP2 development board, used for ~50ps resolution time-of-flight measurements.  Its predecessor is the square board just above the protoboard.  On the extreme left edge you can see the round 2-board stackup of the 3rd revision of my main project, and the little Bluetooth debug adapter just below that.  Misc cables, programmers, hubs, tools, etc. are scattered everywhere else.

And that’s just that part of my desk.  The next 3 or 4 feet left contains my soldering environment, with iron and toaster, hand tools, and piles of parts everywhere…..

Wireless PDI/debug interface development


After spending too much time messing with Bluetooth for programming and debugging my current project, I decided it was time to make some progress on a project that’s been bouncing around in my head and on my hard drive for a while, and recently acquired a decent name: ioStack.

The immediate requirement is that I be able to both program a microcontroller via PDI and interface with its debug serial port, using a wireless device that can be plugged into and powered by the target, on a connector that’s really crazy small.

Connector-wise, I discovered that the USB micro-B jack is actually smaller than the JST ZR-series I was using.  Given that making wiring harness with the ZR is an absolute nightmare, switching to a connector for which pre-made cables are available is a major plus.  The drawback is that USB only has 5 pins, which is a slight problem when dealing with two 2-pin interfaces as well as power and ground.  Luckily, I found a really slick solution to that problem

The next issue is developing a device that can actually talk over this “PDI5” interface.  The PDI and serial ports are mutually exclusive, and can’t simply be two separate interfaces tied together.  As per the post linked above, the 3 data lines must be wired directly to a single sync-capable serial port, which basically means there must be a microcontroller at the core of the program/debug device itself.

Finally, the wireless requirement has several potential solutions ranging from the aforementioned Bluetooth, through existing protocols such as Zigbee/802.15.4, and ending at custom protocols based on commonly-available radio chips.  That final course leads to using a NRF24L01+ connected to the core microcontroller.  While not providing any relevant protocol stack, it does give me a very high bandwidth multipoint-capable wireless PHY.  While Zigbee and ANT seem to run in the 20-64Kbps burst range, the bare NRF24 runs at up to 2Mbps.

Now, to tie all this together I need a decent amount of software.  First off was a PDI programming library, which turned out to be not overly difficult.  An STK500v2/600-compatible protocol handler turned out to be a little more complicated.  For a little background on why, the ioStack concept is based on raw byte streams rather than packetized units.  Internally, everything intending to communicate via these streams has not much more than putbyte and getbyte routines.  That means that the stk500v2 code must be configured as a strict state machine, transitioning on every byte.

Complicating matters further, the PDI protocol itself is not particularly well-suited to sitting around very long during a read.  At most, you can tell it to insert 128 bits of dead-time during turnaround, then it starts clocking bytes at full speed.  The other way around is also a problem, the actual stream multiplexer is currently somewhat intolerant of delays as well.  As a result, I’ve arranged the state machine so that all of the PDI action happens in the CKSUM state, which means the host (e.g. avrdude) has finished sending an entire frame before getting put off for any period of time for things like block reads etc.

To connect the nrf24, I’ve got a chunk of code on top of the actual nrf24 driver that provides a rudimentary byte stream.  This will need to be updated to actually be efficient, i.e. sending more than 1 byte per frame, but it gets the job done with the right abstraction already in place.  I’ve got a lot more nrf24 deciphering and experimentation to do before I can make that module work noticeably better.  Regardless, between the USART, STK500V2, and NRF24 modules I have a multiplexer to direct bytes to the right streams and handle polling the downstream modules for sending back upstream.

Another Xmega/nrf24 act as an “access point” forwarding data between the NRF24 and the USART attached to the PC.  A Python script on the PC does the other half of the multiplexing, attaches the USART “pipe” to the console, and provides a TCP socket tied to the stk500v2’s “pipe”.  Using this mechanism, I can both use avrdude to program the chip, and communicate to the chip’s debug serial port.  This happens over a wireless connection, meeting the original goal of the project ;-)

I’ll try to post some more coherent info on the “ioStack” project soon, since it’s starting to come together finally.


Xmega questions?


Since I’ve touched almost every module in the Xmega at a reasonably comprehensive level, I figured I’d open up the floor to readers to suggest which modules they’d like more information on first.  If there’s a part of the chip you’d like more info about, let me know and I’ll see if I can write something up.  Hopefully I can fold the material into my “getting started” sequence over time.


Xmega fractional baud-rate source code


Earlier I posted a spreadsheet I created that calculated the BSEL and BSCALE for the Xmega’s fractional baud-rate generator.  This works well to determine what the potential is for getting your chip to run a viable baud-rate for a given clock, but isn’t so useful when you actually want to write a configurable piece of code.

Since then I’ve developed two methods for generating the appropriate register settings for a given baud rate.  The first method was designed around the original constraints I had, which were that the CPU frequency and baud-rate were set statically in the source code, and never changed or dealt with programmatically.  As such, it’s a set of macros that determine the best available BSEL and BSCALE:

#ifndef __XMEGA_BAUD_H__
#define __XMEGA_BAUD_H__

#define _BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,bscale) (                \
((bscale) < 0) ?                                                      \
  (int)((((float)(f_cpu)/(8*(float)(baud)))-1)*(1<<-(bscale)))        \
: (int)((float)(f_cpu)/((1<<(bscale))*8*(float)(baud)))-1 )

#define _BSCALE(f_cpu,baud) (                                         \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-7) < 4096) ? -7 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-6) < 4096) ? -6 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-5) < 4096) ? -5 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-4) < 4096) ? -4 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-3) < 4096) ? -3 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-2) < 4096) ? -2 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-1) < 4096) ? -1 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,0) < 4096) ? 0 :                \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,1) < 4096) ? 1 :                \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,2) < 4096) ? 2 :                \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,3) < 4096) ? 3 :                \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,4) < 4096) ? 4 :                \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,5) < 4096) ? 5 :                \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,6) < 4096) ? 6 :                \
7 )

#define BSEL(f_cpu,baud)                                              \

#define BSCALE(f_cpu,baud) ((_BSCALE(f_cpu,baud)<0) ? (16+_BSCALE(f_cpu,baud)) : _BSCALE(f_cpu,baud))

#endif /* __XMEGA_BAUD_H__ */

(beware the line continuations!) Basically, the BSCALE macro steps through the +-7 range hunting for the highest legal (12-bit) BSEL value, and the BSEL macro uses that to generate the right divider.  A typical usage would be something like this:

#define F_CPU 32000000
#define BAUDRATE 115200


More recently I’ve been developing a more “object oriented” set of routines that allow me to stack one thing on top of another (more about that later).  As a result, I needed to develop a form of the above code that would work at runtime.  As you can see from the first macro in the above code, a naive approach would bring a microcontroller to its knees in a matter of seconds (as in: it could take entire seconds to calculate…).  In order to solve this problem I took a look at the problem from a different perspective, and ended up with the following code:

#define F_CPU 32000000

uint8_t xmega_usart_setspeed (USART_t *usart, uint32_t baud) {
  uint32_t div1k;
  uint8_t bscale = 0;
  uint16_t bsel;

  if (baud > (F_CPU/16)) return 0;

  div1k = ((F_CPU*128) / baud) - 1024;
  while ((div1k < 2096640) && (bscale < 7)) {
    div1k <<= 1;

  bsel = div1k >> 10;

  usart->BAUDCTRLA = bsel&0xff;
  usart->BAUDCTRLB = (bsel>>8) | ((16-bscale) << 4);

  return 1;

The above code will result in the best available baud rate, calculated with 0.1% precision (but does not guarantee 0.1% baud-rate accuracy), using only a single 32-bit divide.  My current headache prevents me from properly explaining how it works, but the clever reader should be able to puzzle it out pretty quickly.  I’ll try to replace this excuse with an actual explanation at some point in the future.  If I haven’t yet, write a comment reminding me….

If you’re running on a system with a variable system clock (e.g. stepping the clock up for a burst of performance and back down for a long sleep), you could easily modify the function to take the F_CPU as a function parameter rather than a #define.  Replacing the (F_CPU/16) with (F_CPU>>4) and (F_CPU*128) with (F_CPU<<7) might be necessary to hint the compiler, but everything else should work the same.  You could then precalculate and store the BAUDCTRL values for each clock speed, and swap them in as needed, or if your clock is more variable than that, just run the calculation each time.

I haven’t profiled the runtime of the code yet, but I suspect it’s well under 1000 cycles, dominated by the 32-bit divide.