Archive for the ‘Software’ Category


My magic AVR build script


Don’t get me wrong, I like make…  Hah! Yeah, right!

make has its place, no doubt, but small microcontroller projects not so much.  In my projects I like to just start coding, pulling in existing code where it makes sense.  Maintaining a makefile for that gets old really fast, and is massively overwrought compared to the requirements.

So, over the last couple years I’ve developed my own scripts to build AVR projects.  The latest variation has just received a major speedup, so I thought I’d share it along with some selected Bash tips.

Read the rest of this entry ?


Graphic demonstration of remote clock sync


So I’ve been perfecting the method I use in my current project to sync up the two clocks.  In a previous post I explained the generation method, which consists of using a master-generated reference pulse on the data bus to drive a PID that pushes the slave’s crystal frequency around to get them to line up.

Originally I’d maintained a local difference between the master and slave clocks, and any request for the clock would be adjusted by that.  However, I realized it would be very useful for the actual hardware timers to line up with the same values, and it turned out to be almost a trivial change.  Instead of just taking the offset and storing it, I pause the timers and reset them with a value that’s adjusted by the offset (plus a fudge factor for the time it takes to actually change the timers).  Any slight deviation left over can be handled by the I and D of the PID controller.

The main cycle counter is actually two daisy-chained 16-bit timers, and the daisy-chain mechanism is what on Xmega is called an “event”.  If I select the right event channel (0 out of 0..7) and flip a few config bits, I can output the 16th bit of the cycle counter to a pin, which I can then connect my scope to.  I hooked up my scope to this output on both the master and slave event outputs, and set it to trigger off the master’s pulse.

Initially there is only a master trace (channel 1) visible, with the slave (on channel 3) somewhere entirely else along the 488.3KHz cycle period (32MHz / 16 bits).  Halfway through the video I hit the key that triggers the sync protocol, and you can immediately see the slave’s clock trace show up and home in on a lock.

A quick eye-ball measurement of the jitter and delay gives me about a 1 cycle average delay (32MHz cycle) or about 31nS, and a worst-case jitter of about +-2 cycles or about 62nS. I’m betting some of this has to do with the tuning of my PID, as I can clearly see some periodic overshoot in the crystal adjustment parameter. My goal was supposed to be around 25,000nS if I remember right, so I think I’ve managed fairly well ;-)

Next step is to trigger a single pulse at another time and use that to start the ADC’s conversion clock…


Getting started with nrf24: Introduction


So anybody who’s been following my blog for the last few weeks will have noticed I’ve been fighting with the nrf24 series of chips, trying to get a wireless protocol in place I can use for program & debug capabilities.  Unfortunately, finding any kind of coherent and properly documented example code, or tutorials that make sense, has been rather hard.  I’ve found a few that at least got me partially started, in particular diyembedded‘s tutorials even though they’re based on the PIC and ARM.  However, the many and varied code examples all seem to say slightly different things, generally in less than straightforward ways.  Thus, I’m going to attempt to write my own tutorial series, as I bootstrap my way up through the available functionality.  It’s going to be very code-based, yet hopefully in a form that’s not too confusing.  This will also provide me with the opportunity to solidify my own understanding of the chip(s).

To start things off, the nrf24 series of chips from Nordic Semiconductor have gained a lot of popularity both in product and DIY circles because of their (relative) simplicity and low cost.  The radios themselves operate in the 2.4GHz band, and have a maximum air bitrate of 2Mbps in a 2MHz channel.  They use a GFSK encoding with a -82dBm sensitivity at 2Mbps, giving them a range in the 10’s of meters on average.  There are relatively few external components required, though I’ve had some problems there myself…

The tutorial sections themselves are [will be]:

  1. Hardware connection
  2. Physical layer communication
  3. Trivial transmission
  4. …[TBD]

The sections will be linked here as they are written, and this will eventually be the root document for the tutorial.


Reduced-pin-count program+debug connector for Xmega


I’ve been struggling recently with the extremely small JST SH connectors, which are 1.0mm crimp-style.  No matter what wire I use, I can’t get them to stop breaking, and they’re a royal pain to put together in the first place.  This is a problem, because it’s one of the only connectors I can find that’s small enough to fit on my current project’s board with 6 pins.  I need those pins for power, ground, the two Xmega PDI lines, and a debug serial port.  Currently I connect either a standard AVR-ISP mkII or a Bluetooth serial adapter to the boards via this tiny connector, and all that wear and tear on the connector and wires is becoming quite evident.

I’m about to ditch Bluetooth for quite a few reasons, but that’s a whole other story.  Suffice it to say: the replacement will be an Xmega-based device, and that’s where this nifty little trick comes from.

I did some hunting and found that there actually is a connector that’s even smaller than the SH-6: USB micro-B.  It’s a hair narrower, and about 20% shallower, resulting in that much more board space to work my routing magic.  The only problem: USB only has 5 pins….

My first (or near enough) thought was “use the shield”.  Since I’d be constructing the cable from scratch, I can solder a 6th wire to the shield (which is not otherwise connected to anything but the cable’s braid) and use that as ground.  Cheesy, but functional.

Then I thought about how the PDI protocol is actually implemented.  I’d rewritten most of the PDI stack based on the example code in LUFA recently, and it turns out that PDI is basically a bidirectional synchronous serial protocol.  That means that RX and TX are both connected via 220R resistors to the PDI_DATA line, and XCK drives PDI_CLK.  This presents the possibility that the PDI and serial lines could be shared.

As it turns out, this works very nicely.  On my breadboard I hooked up an Xmega target and Xmega programmer, with a crazy serial loopback chain that routed from an FT232R through the programmer to the target, and out another FT232R.  The PDI and serial ports from the target are routed to the middle, where the resistors create PDI_DATA from RX and TX.  PDI_CLK, RX, and TX hop over to the three pins on the programmer.  The software running on the programmer is a two-port serial loop-through, with the exception of a “p” coming from the test computer.  In that case, it shuts down the normal serial port, starts up the PDI interface, confirms the chip’s signature, then switches back.

The end result is a 5-pin interconnect between the two chips with both full hardware programming and serial debugging capabilities.  As such, we can now route it through a USB micro-B connector ;-)

(click for larger, readable version)

Now, this won’t work with a “discrete” PDI prorgammer, since you only get the combined Rx+Tx line out of the programming header, and this trick depends on combining them on the device.  Thus, you pretty much have to have a “custom” unit doing the programming and serial bridging.  However, that’s exactly where I’m headed after Bluetooth bites the dust…


Binary Field Search – re-entrant re-implementation of re[-]cursion


(say that i++ times fast!)

As part of the main product I’m working on right now, the master node must be able to identify and communicate with multiple nodes on a shared bus in an addressable manner.  This requires a mechanism to enumerate the devices in a reasonably controlled time based on their serial number or other signature data.  Because it’s a shared medium, asking all the stations to respond just ends up with a complete jumble on the bus.  Stepping through the entire potential space saying “does 1 exist? does 2 exist? does 3 exist? …” is wildly impractical given the nominal 32-bit search space – asking 4+ billion times isn’t in our budget…

The solution is a derivative of the standard binary search.  Normally you’re only looking for a single value, so you just keep checking digits until you find the one correct value.  For 8-bit, you’d start with “greater than 0x80, or less?”.  If less than, “greater than 0x40, or less?”.  If greater, “greater than 0x60, or less?”, rinse repeat.  The challenge to a field search rather than a [single] value search is that once you’ve honed in on the first value, you have to back up and continue searching the rest of the space.

I previously implemented a strictly recursive version for ease of development (since I was fighting with the mechanism of asking the question at the same time).  It’s a fairly trivial extension of a value search: instead of breaking all the way out after finding a full match, it doesn’t.  Here’s [pseudo] pseudo-code for it, with the highlighted code being the difference between value and field search:

(beware I’m writing this off the top of my head and I can almost guarantee it’s not strictly correct)

# check all stations for *one or more* match of the upper (maskbits) with (value)
bool question(uint32_t value, int maskbits):
  mask = 0xffffffff << (32-maskbits)
  for station_signature in all stations:
    if (station_signature & mask) == value:
      return True

int search(uint32_t value, int curbit):
  if curbit == -1:
    return True
  value[curbit] = 0
  if search(value,curbit+1):
    return True
  value[curbit] = 1
  if search(value,curbit+1):
    return True
  return False


A value search returns up through the entire recursive stack as soon as it finds the first match.  Remove the return stack and let it keep going, and you’ve got a field search.

The problem is that this implementation is monolithic, and runs until completion.  For a large enough number of stations it can take many hundreds of queries before it finds all the nodes.  For the product in question that’s not a fatal problem, but I’d really like to be able to send other packets on the bus with a particular regularity that would be precluded by having the search run to completion.  As a result, a recursive technique isn’t in the cards since I have no threading stack on the chip.  That means developing a re-entrant form that works on the same principle.

What I’ve developed (with no pretenses that it’s new, FWIW) is a fairly basic state machine that uses the value and curbit variables alone, rather than in combination with the current program counter as it steps from the clear to set portions of search().

The basic rules are simple: every call generates a query based on the current value and curbit, then modifies the as appropriate depending on the success of the query and the bits present in value. Initial state is  value = 0x00000000 and curbit = 31.  For  each iteration: If the search matches, and we’re not on the last bit, simply move to the next bit down (curbit–) and start over.  If we are on the last bit but it’s a zero, set it to one before starting over.  If it’s a one, we “pop” all contiguous 1’s off the end (e.g. 01001011 pops twice) while resetting them to zero, change the zero above that to a one, and start over.  If the search doesn’t match, we check the current bit.  If it’s a zero, we change it to one and start over.  If it’s a one, “pop” all the ones and change the preceding zero to a one before starting over.  If we pop to a bit that doesn’t exist (in this case 32), we’re done with the entire search space.

  while value[curbit] == 1:
    value[curbit] = 0

if search():
  if curbit == 0:
    if value[curbit] == 1:
    value[curbit] = 1
  if value[curbit] == 1:
  value[curbit] = 1

if curbit == 32:

The search pattern generated by this algorithm is identical to the recursive technique above, but the state is contained completely in the variables rather than the code pointer.  This means I can intersperse other packets in the bus while the search continues independently.  In my case the add_station() code actually tells the station in question to stop responding to these queries, so unless more stations show up on the bus between runs, most of the time will be spent repeating the first two queries to no avail (0xxxxxx? 1xxxxxx? Bueller? Bueller?).  Since the purpose of this state of the product is to keep catching new stations being physically attached to the bus, most query sequences will only find a single station.


Continuing work on “nano-curses”


As part of the progression on my main contract (which has another rev of PCB’s and parts on their way), I’m diving back into my “nano-curses” project.  I’ve got the basic operations on stdscr working, including a semi-smart refresh() algorithm (it moves the cursor after any gap in changed characters, rather than any gap bigger than the size of the move-cursor command itself).  Now I’m figuring out how to do subwindows and scrolling, as those are the features that made me look to using curses in the first place.  The biggest problem is untangling the actual intended operation of the various functions and arguments.  For instance, when creating a subwindow, the documentation says nothing about what happens if the x,y,width,height places any of the window outside the parent window.  I have to dive into ncurses code in order to determine that such a situation is an error condition, and should fail to allocate the subwindow at all.  I  don’t yet have the faintest idea what’s supposed to happen in regards to refresh()ing the stdscr when a subwindow has been scrolled.  A naive form would just internally scroll the screen buffer and leave it to the refresh() algorithm to redraw it all.  However, that completely ignores the ability of the “physical” terminal to do subwindow scrolling.  I’m still tracking down how ncurses keeps track of child windows…


Writing “nano-curses” for microcontrollers


My main contract these days is developing into a software stack that needs to both send a lot of debugging and status data, and receive commands over the serial debug port (which in the next generation will be Bluetooth…).  The problem is, I need to be able to deal with this in the main codebase in a relatively sane manner.  That means using xterm command sequences, but hiding them behind some kind of API. The obvious candidate would be curses, of course.

So, I’m developing a microcontroller version of curses, that’s intended to be as lightweight as possible given the complexity of the problem.  The first major cost is the terminal buffer, which has to be at least a byte for every character on the screen.  I might experiment later with a non-buffered curses, especially as this is not intended to run over “slow” connections in the first place (at least for my application).  The real challenge is going to be figuring out things like subwindow scrolling rules, and all the quirks like keeping track of the cursor when a tab is inserted, etc.

I plan on releasing this code as open-source once it’s reasonably usable, and I have somewhere on my as-yet-nonexistant website to put it…