Archive for June, 2010


Getting started with Xmega: differences from ATmega (part 2)


In Part 1 I explained some of the high-level differences between the older ATmega and the newer Xmega chips.  This includes things like pinout cleanliness, enhanced peripheral count, and much less arbitrary overlap between functions.  In Part 2, I’ll be delving deeper into the architectural changes that result from this design, and how they make writing software for the Xmega much more manageable.

The issue at hand now is not where the peripherals are placed physically on the chip, but how they’re interacted with by software, logically.  As with any other microcontroller, this is done via registers.  These are specific locations in memory (or sometimes a “third” address space, besides memory and code) that when read from or written to will cause some behavior within the peripheral that the register is associated with.  For instance, writing to a USART data register will typically push the written byte into a temporary buffer and start transmitting that byte over the serial port.  Reading from the same register will pull from a different temporary buffer and retrieve the byte that was most recently received.  Other registers contain flags, such as the Transmit Enable flag in one of the USART’s control registers.

To start off with, we’ll again go back to the venerable ATmega*8 as used in the Arduino.  Let’s list all the registers that have anything to do with any of the Port D pins, and what their register address is:

  • 0xC6 UDR0
  • 0xC5 UBRR0H
  • 0xC4 UBRR0L
  • 0xC2 UCSR0C
  • 0xC1 UCSR0B
  • 0xC0 UCSR0A
  • 0xB4 OCR2B
  • 0xB3 OCR2A
  • 0xB2 TCNT2
  • 0xB1 TCCR2B
  • 0xB0 TCCR2A
  • 0x7F DIDR1
  • 0x7B ADCSRB
  • 0x70 TIMSK2
  • 0x6E TIMSK1
  • 0x6D PCMSK2
  • 0x69 EICRA
  • 0x50 ACSR
  • 0x48 OCR0B
  • 0x47 OCR0A
  • 0x46 TCNT0
  • 0x45 TCCR0B
  • 0x44 TCCR0A
  • 0x3D EIMSK
  • 0x3E EIFR
  • 0x3B PCIFR
  • 0x37 TIFR2
  • 0x35 TIFR0
  • 0x2B PORTD
  • 0x2A DRD
  • 0x29 PIND

That’s a lot of registers!  Now while I’m not going to claim that the Xmega uses particularly fewer registers than the ATmega, I challenge you to tell me quickly what every one of those registers does…  In comparison, the registers needed for Port D on an Xmega:

  • PORTD.
    • IN
  • TCD0.
    • CTRLA
    • CTRLE
    • TEMP
  • TCD1.*
  • USARTD0.
    • DATA
    • STATUS
  • USARTD1.*
  • TWID.
    • CTRL
    • MASTER.
      • STATUS
      • BAUD
      • ADDR
      • DATA
    • SLAVE.
      • STATUS
      • ADDR
      • DATA
      • ADDRMASK
  • SPID.
    • CTRL
    • STATUS
    • DATA

Now this is somewhat more comprehensible.  Yes, there are a metric ton more registers, but they all represent significantly enhanced capabilities.  More importantly, they’re all grouped very clearly by module.  If you want to use the first USART on Port D, you start by setting USARTD0.CTRLA, and work from there, rather than trying to remember UCSR0A.  Good luck remembering which UCSR* goes with which port on a bigger chip like the ATmega128…

You’ll notice that both TCD1 and USARTD1 aren’t enumerated, but just listed with a *.  That’s because they have the exact same registers as their D0 counterparts (except that TCx1 drop the 2nd and 3rd compare registers).  Compared to the ATmega, that’s a major bonus: all the peripherals are the same, both between multiple instances in the same chip and between chips in the series.

Delving even deeper, let’s look at how the SPI port is described first in the ATmega8 header file:

/* SPI */
#define SPCR    _SFR_IO8(0x0D)
#define SPSR    _SFR_IO8(0x0E)
#define SPDR    _SFR_IO8(0x0F)

…and now how it’s defined in the Xmega headers:

/* Serial Peripheral Interface */
typedef struct SPI_struct
    register8_t CTRL;  /* Control Register */
    register8_t INTCTRL;  /* Interrupt Control Register */
    register8_t STATUS;  /* Status Register */
    register8_t DATA;  /* Data Register */
} SPI_t;
#define SPIC    (*(SPI_t *) 0x08C0)  /* Serial Peripheral Interface C */
#define SPID    (*(SPI_t *) 0x09C0)  /* Serial Peripheral Interface D */

In the Xmega, every peripheral is given a block of register space, and all the individual registers are allocated within that block.  The SPID register block looks exactly like the SPIC, and SPIE, and SPIF register blocks, except for the starting address.  Thus, the only difference between the ATxmega*A4 and ATxmega*A3 is the following:

#define SPIE    (*(SPI_t *) 0x0AC0)  /* Serial Peripheral Interface E */
#define SPIF    (*(SPI_t *) 0x0BC0)  /* Serial Peripheral Interface F */

A major side effect of all these structures is that you can now easily construct functions and other code structures that can operate on one of these peripherals purely by address:

void spi_init(SPI_t *port) {
    port->CTRL = 0xd0;
    port->STATUS = 0x80;

With simply dot notation, you can significantly simplify your hardware configuration:

#define LED_PORT     PORTC
#define LED_RED_bp   0
#define LED_GREEN_bp 0
#define LED_BLUE_bp  0

// . . .

In comparison, the same for the ATmega8 would be:

#define LED_PORT    PORTC
#define LED_DIR     DDRC
// . . .

If you are using peripherals more complex than just an output port, you can imagine how having to #define all the various registers to keep track of which of potentially several similar peripherals (e.g. USARTs) is used by that logical device would get rather obnoxious.

This particular feature has saved me uncounted hours on the product I’m developing, by allowing me to keep the same codebase across multiple revisions of the hardware.  “hardware.h” contains a switchout that loads “hardware-rev1.h” or “hardware-rev2.h” or whichever.  All these files list the same board-level peripherals (the LED, the debug and RS-485 ports, I2C for the clock PLL, etc.), and as long as I use them exclusively in my actual code, all I have to do when I switch around the hardware layout is to generate a new file and change which ports and such are referenced.

(Part 3: how this peripheral interchangeability can make your project design radically more flexible)


Getting started with Xmega: differences from ATmega (part 1)


To start out this series on getting started with Atmel’s new Xmega chips, I first need to explain what it is that makes it an upgrade from the original AVR ATmega chips.  While there are a lot of common elements, the combination of a large number of peripherals and the mechanisms Atmel provides to connect them all together makes for a very powerful chip.  The Xmega are capable of things that an ordinary AVR can only dream about.

For reference, let’s start with the configuration of the ever-popular ATmega*8, the core of the Arduino series:

Here we have a color-coded diagram showing the pins with all their alternate functions.  There are a total of 3 ports, only two of which have all 8 bits.  Port C is missing PC7 entirely, and PC6 is generally unavailable as it is multiplexed with the RESET pin, required to reprogram the device.  Port B will be lacking PB6 and PB7 in most applications, as they are multiplexed with the crystal driver.  In addition, notice that the pins of a given port are not only scattered around in various places on the chip, but not necessarily even in order.  The ATmega*8 does better than some, and certainly light-years better than any PIC I’ve seen, but it’s still a routing challenge waiting to happen.

Even more than the pin orderings, notice the fact that there’s only one serial port, one I2C (sorry, TWI) port, and one SPI port.  Three timers give a total of 6 potential PWM outputs if you don’t need the timers for anything else, and you don’t need the SPI port that overlaps 2 of them.  Analog is spread between the 6 ADC pins on Port C (two of which are lost if you need I2C), and the comparator steals another PWM output from a different port entirely.  However a major upgrade to the ATmega*8 series versus previous generations is the addition of the PCINT* capability.  Instead of being stuck with just INT0 and INT1 for external interrupts, every single pin can be configured to trigger one of a cluster of interrupts.

Now let’s look at another popular AVR in a bigger package, the ATmega1284:

This looks a lot better, due in part to the larger package.  Not only do we get all 8 bits of every port, but they’re actually all in order.  We gain an additional serial port (TXD/RXD1) though without synchronous capability (no XCK1).  A couple more ADC pins are available, since Port A is complete and the TWI pins have moved elsewhere.  We’re still stuck with only 6 PWM’s, but only one of them is potentially unavailable, and only if the SPI module is used in slave mode (since SS# can be moved anywhere when the chip is in master mode).  The RESET# and XTAL pins have also moved to their own dedicated pins, so that’s even fewer lost pins, though with the drawback that we end up “losing” two pins if there’s no crystal attached.

Now let’s take a look at the ATxmega*A4, the smallest of the new line:

Right off the bat we notice a slight change: the package is no longer DIP, but TQFP.  This is the main drawback of the chips: they’re only available in surface-mount package.  However, I’ve rectified that by developing (and selling) adapters that convert the chips into standard DIP pinouts: (insert link here).

The next thing you should notice is a preponderance of highlighted pin functions.  Instead of 9 or 11 “major” alternate functions (serial, TWI, SPI), we have 27.  This chip has 5 serial ports, 2 TWI ports, and 2 SPI ports.  Even better, every single one is identical from a software perspective, but more on that later.  We also see a total of 12 ADC inputs, and even two DAC outputs!  Spread between ports C through E we find 16 PWM outputs, and the diagram doesn’t even bother showing the “PCINT” functionality, because every single pin of every single port is capable of various types of interrupts.  The crystal pins are available for use as a normal port (R) if you only need the internal oscillators.

A key feature is the fact that the programming pins are completely dedicated to the task.  Marked in purple above, RESET# and the CLK/DATA pins are all that are needed to program the Xmega chips (besides reference power and ground).  These pins are never multiplexed with anything else, so no more careful wiring of the SPI port so you can still flash the chip…

On the bigger end of things, we have the ATxmega*A1 chips:

Being the largest chip in the series it has 100 pins.  You should be able to click on the above image to get a larger one you might be able to read the labels on…

Working from ports A to R, this chip has: 16 inputs on 2 separate ADCs, 4 outputs on 2 separate DACs, 8 serial ports, 4 TWI ports, 4 SPI ports, 24 PWM outputs, and a memory interface capable of both SRAM and SDRAM up to 16MB.  A “timer” crystal connection is available on the extra 4 pins at the top just in case.

The pin arrangement is very clean, with every port in order around the chip, all contiguous, and all running in the same pin order (though the same can’t really be said of the BGA version, Atmel has been made aware of the serious flaws in pin placement there…).  There are power and ground pins for every port, capable of 200mA each.  In particular, that makes the chip capable of driving 20mA on every single pin simultaneously, a potential boon for those using discrete LEDs.

(Part 2: structural differences in how registers are managed make the plethora of peripherals more manageable)


Binary Field Search – re-entrant re-implementation of re[-]cursion


(say that i++ times fast!)

As part of the main product I’m working on right now, the master node must be able to identify and communicate with multiple nodes on a shared bus in an addressable manner.  This requires a mechanism to enumerate the devices in a reasonably controlled time based on their serial number or other signature data.  Because it’s a shared medium, asking all the stations to respond just ends up with a complete jumble on the bus.  Stepping through the entire potential space saying “does 1 exist? does 2 exist? does 3 exist? …” is wildly impractical given the nominal 32-bit search space – asking 4+ billion times isn’t in our budget…

The solution is a derivative of the standard binary search.  Normally you’re only looking for a single value, so you just keep checking digits until you find the one correct value.  For 8-bit, you’d start with “greater than 0x80, or less?”.  If less than, “greater than 0x40, or less?”.  If greater, “greater than 0x60, or less?”, rinse repeat.  The challenge to a field search rather than a [single] value search is that once you’ve honed in on the first value, you have to back up and continue searching the rest of the space.

I previously implemented a strictly recursive version for ease of development (since I was fighting with the mechanism of asking the question at the same time).  It’s a fairly trivial extension of a value search: instead of breaking all the way out after finding a full match, it doesn’t.  Here’s [pseudo] pseudo-code for it, with the highlighted code being the difference between value and field search:

(beware I’m writing this off the top of my head and I can almost guarantee it’s not strictly correct)

# check all stations for *one or more* match of the upper (maskbits) with (value)
bool question(uint32_t value, int maskbits):
  mask = 0xffffffff << (32-maskbits)
  for station_signature in all stations:
    if (station_signature & mask) == value:
      return True

int search(uint32_t value, int curbit):
  if curbit == -1:
    return True
  value[curbit] = 0
  if search(value,curbit+1):
    return True
  value[curbit] = 1
  if search(value,curbit+1):
    return True
  return False


A value search returns up through the entire recursive stack as soon as it finds the first match.  Remove the return stack and let it keep going, and you’ve got a field search.

The problem is that this implementation is monolithic, and runs until completion.  For a large enough number of stations it can take many hundreds of queries before it finds all the nodes.  For the product in question that’s not a fatal problem, but I’d really like to be able to send other packets on the bus with a particular regularity that would be precluded by having the search run to completion.  As a result, a recursive technique isn’t in the cards since I have no threading stack on the chip.  That means developing a re-entrant form that works on the same principle.

What I’ve developed (with no pretenses that it’s new, FWIW) is a fairly basic state machine that uses the value and curbit variables alone, rather than in combination with the current program counter as it steps from the clear to set portions of search().

The basic rules are simple: every call generates a query based on the current value and curbit, then modifies the as appropriate depending on the success of the query and the bits present in value. Initial state is  value = 0x00000000 and curbit = 31.  For  each iteration: If the search matches, and we’re not on the last bit, simply move to the next bit down (curbit–) and start over.  If we are on the last bit but it’s a zero, set it to one before starting over.  If it’s a one, we “pop” all contiguous 1’s off the end (e.g. 01001011 pops twice) while resetting them to zero, change the zero above that to a one, and start over.  If the search doesn’t match, we check the current bit.  If it’s a zero, we change it to one and start over.  If it’s a one, “pop” all the ones and change the preceding zero to a one before starting over.  If we pop to a bit that doesn’t exist (in this case 32), we’re done with the entire search space.

  while value[curbit] == 1:
    value[curbit] = 0

if search():
  if curbit == 0:
    if value[curbit] == 1:
    value[curbit] = 1
  if value[curbit] == 1:
  value[curbit] = 1

if curbit == 32:

The search pattern generated by this algorithm is identical to the recursive technique above, but the state is contained completely in the variables rather than the code pointer.  This means I can intersperse other packets in the bus while the search continues independently.  In my case the add_station() code actually tells the station in question to stop responding to these queries, so unless more stations show up on the bus between runs, most of the time will be spent repeating the first two queries to no avail (0xxxxxx? 1xxxxxx? Bueller? Bueller?).  Since the purpose of this state of the product is to keep catching new stations being physically attached to the bus, most query sequences will only find a single station.


Continuing work on “nano-curses”


As part of the progression on my main contract (which has another rev of PCB’s and parts on their way), I’m diving back into my “nano-curses” project.  I’ve got the basic operations on stdscr working, including a semi-smart refresh() algorithm (it moves the cursor after any gap in changed characters, rather than any gap bigger than the size of the move-cursor command itself).  Now I’m figuring out how to do subwindows and scrolling, as those are the features that made me look to using curses in the first place.  The biggest problem is untangling the actual intended operation of the various functions and arguments.  For instance, when creating a subwindow, the documentation says nothing about what happens if the x,y,width,height places any of the window outside the parent window.  I have to dive into ncurses code in order to determine that such a situation is an error condition, and should fail to allocate the subwindow at all.  I  don’t yet have the faintest idea what’s supposed to happen in regards to refresh()ing the stdscr when a subwindow has been scrolled.  A naive form would just internally scroll the screen buffer and leave it to the refresh() algorithm to redraw it all.  However, that completely ignores the ability of the “physical” terminal to do subwindow scrolling.  I’m still tracking down how ncurses keeps track of child windows…


Xmega USART fractional baud-rate speadsheet


The default baud rate on the Bluetooth adapters I’m using to program and debug the current generation of board for my main contract is 115.2K.  That’s actually rather slow when shoving 50-80KB onto the chip every time I make a code change.  The adapters are capable of up to 921.6K, but even at 32MHz a normal USART baud-rate generator ends up with a particularly ugly error percentage (8.5% as it happens, well outside the allowed 2.0%).  However, the Xmega has a fractional baud-rate generator.  I’m not actually sure how it operates, but I know it’s capable of generating much more accurate serial clocks.

Because the calculations are rather tedious, I designed a simple spreadsheet to tell you what the usable BSEL and BSCALE values are for a given rate.  Plug in your main clock rate and target baud rate, and it’ll show you the viable combinations.  For 32MHz 921.6Kbaud, the BSCALE has to be set to -2 or lower, with -7 providing a combination that’s only off 0.1% of nominal.


A peek at complex PCB layout


Attached is a shot of the main microcontroller board for my current contract.  It’s the second revision in this form factor (4th overall), with an eye towards both switching to a “smaller” (yet more RAM!) microcontroller and fixing some power problems.  Those of you who were at the last DorkbotPDX meeting saw the previous revision in the flesh.

The board has an ATxmega192a3 on the top, CDCE913 PLL and 6V->3.3V switcher on the bottom, plus a ring of connectors on both top and bottom which connect it to two other boards in the stack.

As you can see it isn’t done, but it’s getting a lot closer (and cleaner) than my previous attempt, which had the PLL on the bottom right and the programming connector on the top left stacked on top of each other.  Some shuffling of all the microcontroller connections (yay Xmega!) allow me to flip things around and clean stuff up quite dramatically.

The catch is: the board is 1.25″ round…


Mental excercise: tiny acceleration logger


When watching shows like Mythbusters, Prototype This, and Smash Lab, you regularly see them using accelerometers in order to test the shock characteristics of a given event.  The problem is that these loggers are invariably big (roughly 6″ cube by volume) and fraught with problems such as “triggered, hurry up before we run out of log space!”.  Given the current crop of really tiny accelerometer chips (such as the Freescale MMA4755 mentioned on dorkbotpdx, for $2 quantity 1), microcontrollers, flash memory and batteries, it seems almost criminal that they aren’t using something radically smaller and smarter.

So just for fun, I decided to toss some parts together and see if they would actually fit on an incredibly tiny package.  The design is not complete, as it is missing any charging capability and more importantly a Bluetooth uplink module, but it’s a good start:

I call it “rev -1″ because of the missing features, but it’s a good start.  However, it’s a whopping 1.125″ x 0.5”, sized to match a tiny LiPoly cell from Sparkfun.  It’s got an ATmega48 core, MMA4755 accelerometer capable of up to +-8G, and an 8Mbit dataflash capable of roughly 15 minutes of continuous recording at the maximum 250Hz sampling rate.  A tact switch is used to turn it on etc, and an LED just under that for status.

The next steps would be to add the Bluetooth module (e.g. this one also from Sparkfun), some kind of charging circuit, and possibly a USB-A connector to make it into a “USB key” for both data extraction and recharging.  I’d drop the huge battery connector in favor of soldering it on, and put the Bluetooth (and maybe some other parts) on the opposite side of the board, but I still think it’d fit.  I’d want to find a solid power-management chip designed to handle USB input and LiPoly charging that’s also insanely tiny, but these days the manufacturers are so keyed in on making multifunction chips in stupidly-small packages that it’s probably going to be harder to find one big enough to do on the group-order design rules.

Total retail price on this module would likely be in the $75-100 apiece, but the combination of size and potential feature set should be hard to beat.