Banish the MVP?

MVP is, alas, an ambiguous acronym.  So let’s start by taking care of the obligatory MVP, MVP. What About MVP? MVP! MVP! MVP. MVP. What about MVP? MVP’s me. What about me?

But that’s not all that’s ambiguous about MVP.  MVP, or “Most Vexing Parse” to be clear, is all about unexpected ambiguity, and one tiny rule in the C++ spec that causes behavior totally contrary to what most programmers expect.

Consider the following, at file scope (global or namespace, doesn’t matter):

int a(5);

That clearly defines a variable a which is initialized to 5.

int f();

That’s a prototype (forward-declaration) of a function which accepts no arguments and returns a value of type int.

int x = int();

Clearly value-initialization, calling the zero-argument constructor.

Now, inside a function:

int main(int argc, char** argv)

{

  int b(0);

  int c();

  int d = int();

}

Contrary to the expectation of most programmers, nothing has changed AT ALL except scope.  b and d are variables, now local, which are initialized to zero, one explicitly, one by value-initialization.  And c is, like it or not, a function prototype, not a constructor call.

Big deal you say, sure that’s unexpected and somewhat annoying, but the workaround is one extra character to make the zero explicit.  What could be easier?

Well, under the right circumstances, which happen to be

 

  • a requirement for value-initialization of a local variable
  • whose type is a template parameter
  • which could be a user-defined type with an inaccessible copy-constructor

 

it becomes a HUGE pain.

Let’s start at the end.

(§ 8.5 P 10 in the C++0x draft)  stackoverflow.com/questions/2671532/non-copyable-objects-and-value-initialization-g-vs-msvc/2671960#2671960

Optimization for fun and SO reputation

A post on stack overflow raises the question, how do I optimize this code:

Color GetPixel(int x, int y)
{
    int offsetFromOrigin = (y * this.stride) + (x * 3);
    return Color.FromArgb(this.imagePtr[offsetFromOrigin + 2],
this.imagePtr[offsetFromOrigin + 1],
this.imagePtr[offsetFromOrigin]);
}
void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{           
    for (int x = 0; x < Width; x++)
    {
        for (int y = 0; y < Height; y++)
        {
            Byte  pixelValue = image.GetPixel(x, y).B;
            this.sumOfPixelValues[x, y] += pixelValue;
            this.sumOfPixelValuesSquared[x, y] += pixelValue * pixelValue;
        }
    }
}

Looks like calculation of variance (with respect to time) of a sequence of frames, could be part of a motion detector or something like that.

Now, there are a few .NET-specific things to look at. 

A lot of answers keyed in on the name GetPixel(x, y) without noticing the custom version the OP was using.That’s a pretty important point since Image.GetPixel is really slow, but it didn’t apply in this case since Image.GetPixel wasn’t being called.  As an aside, the Win32 native GDI function GetPixel is also dog slow, native programmers should use GetDIBits if the algorithm requires processing data in a particular pixel format, or GetObject to get the raw data without conversion.

Another is the use of System.Drawing.Color which is a value type, and value types break almost all .NET optimizations, at least in 32-bit mode, but that’s now fixed (for people who install recommended Windows Updates).  Since in this case use of Color is totally unnecessary, we’ll just make that problem disappear:

byte GetPixelBlue(int x, int y)
{
    int offsetFromOrigin = (y * this.stride) + (x * 3);
    return this.imagePtr[offsetFromOrigin];
}

But those are all problems very specific to the environment and libraries being used.  A great example of why all programmers should know what’s underneath the abstractions they use, but I wanted to discuss something with more widespread application.

The only parts of this particular example that continue to be relevant are:

  • two-dimensional data
  • a moderately large data set
  • no data dependencies between iterations of the loop (this is commonly called being “embarassingly parallel

All problems with two (or more) dimensional data of non-trivial size create potential for problems with locality of reference.  Looking at this particular example, additional information was that Width = 640 and Height = 480, and there are 200 such images being accumulated into the output statistics.  Therefore the total size of the source data is:

480 rows * 640 pixels/row * 3 bytes/pixel = (approximately) 1 MB

Next let’s look at the destination arrays.

There’s a rule of thumb that says modern processors work fastest on the native word size (but x86_64 is equally optimized for 32-bit int) so use that instead of smaller variables.  Great advice for local temporary variables, but that rule DOES NOT APPLY when you have >10,000 elements (here we have approximately 300,000).  What data type is actually appropriate?

Well, sumOfPixelValues gets a value from 0-255 into each element from each image, and 200 images, so that would fit into 16 bits.  But that would break if there were ever more than 255 images processed which is not a very big safety margin for future growth.  There are two options then: use 32 bits to provide additional safety margin, or assert on the number of images (the assertion is outside the processing loop, so no big performance impact).

sumOfPixelValuesSquared gets a value from 0*0 to 255*255 from each image, so definitely 32 bits are needed for image counts from 2 all the way to 2^16… assertion optional but probably still the best way to document the assumption.

Assuming 16-bit sumOfPixelValues and 32-bit sumOfPixelValuesSquared these arrays are .6MB and 1.2MB respectively, for a total data size of (nearly) 2.8MB.  On a reasonably modern general purpose computer this fits within main memory comfortably, but even the latest CPUs (e.g. Core i7 or Phenom II) have only about 2MB of L3 cache per core and 256-512KB of L2 cache, 32-64KB data cache.  So this is a data set size where locality of reference is going to mean the difference between RAM access and L1 cache, which represents about an order of magnitude difference in bandwidth.

Let’s look at the cache behavior of the code as written.  Each iteration of the inner loop accesses three distinct memory locations, (as well as several member accesses which hopefully have been read into registers – although if any calls are made to opaque functions then the compiler has to pessimistically assume the member variables have changed and re-read them, use local copies to re-enable the optimization):

void PopulatePixelValueMatrices(GenericImage image, int Width, int Height)
{           

    uint [,] sums = this.sumOfPixelValues;
    uint [,] squares = this.sumOfPixelValuesSquared;

     for (int x = 0; x < Width; x++)
    {
        for (int y = 0; y < Height; y++)
        {
            byte pixelValue = image.GetPixel(x, y).B;
            sums[x, y] += pixelValue;
            squares[x, y] += pixelValue * pixelValue;
        }
    }
}

source: imagePtr[y * stride + x * 3], where stride = 640 pixels/row * 3 bytes/pixel (already on a four-byte boundary), or offset 3 * (y * 640 + x)

destination 1: sums[x, y], which is at offset sizeof(sums[0,0])* (x * 480 + y)

destination 2: squares[x, y], which is at offset sizeof(squares[0,0])* (x * 480 + y)

Now, the inner loop increments y, so that two successive iterations hit memory words which are 3*640 bytes apart for source and adjacent in the destination.  Since cache lines are on the order of 64-128 bytes each, these are definitely not in the same cache line.  After 480 passes through the inner loop, the outer loop increments x.  In that time 480 cache lines have been used reading the pixel data, which is enough to ensure total cache replacement.  Therefore there will be 480cache misses on each iteration of the outer loop * 640 iteration = over 300,000 cache misses per frame.  All this for an operation that theoretically required only 300,000 multiplies and 600,000 additions (on data) and roughly the same number for calculation of effective addresses.

Let’s try a really simple change, swap the two loops:

public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{         
    uint [,] sums = this.sumOfPixelValues;
    uint [,] squares = this.sumOfPixelValuesSquared;
    for (int y = 0; y < Height; y++)
    {
        for (int x = 0; x < Width; x++)
        {
            byte pixelValue = image.GetPixelBlue(x, y);
            sums[x, y] += pixelValue;
            squares[x, y] += pixelValue * pixelValue;
        }
    }
}

Now, the inner loop increments y, so that two successive iterations hit pixel bytes which are only 3 bytes apart.  This will now fall into the same cache line 90%+ of the time, and sequential motion through the three arrays gives predictive prefetchers a fighting chance to have the next cache line loaded and ready when it is needed.  Even leaving the inner loop moves from  (for the source array) offset 3 * (y * 640 + 639) to 3 * (y+1) * 640 which still is perfect locality of access.  All the data is in L1 cache all the time, keeping the arithmetic units very busy.

But, the destination array now suffers from the problem we just fixed while accessing the source.  We’ll have to change the memory layout of the result arrays to make the offset (y * 640 + x) instead of (x * 480 + y), here’s the code:

public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{         
    uint [,] sums = this.sumOfPixelValues;
    uint [,] squares = this.sumOfPixelValuesSquared;
    for (int y = 0; y < Height; y++)
    {
        for (int x = 0; x < Width; x++)
        {
            byte pixelValue = image.GetPixelBlue(x, y);
            sums[y, x] += pixelValue;
            squares[y, x] += pixelValue * pixelValue;
        }
    }
}

The use of 2-D arrays in this code suggests that the offset calculation is being performed all the time, even though we know consecutive access are to adjacent elements.  Hopefully the optimizer is smart enough to figure this out within the inner loop, but it probably isn’t for the outer loop.  But it isn’t that difficult to switch to pointers and gain control over the calculations:

unsafe void PopulatePixelValueMatrices(GenericImage image, int Width, int Height)
{         
    byte* scanline = image.imagePtr;
    fixed (uint* sums = &this.sumOfPixelValues[0,0])
    fixed (uint* squared = &this.sumOfPixelValuesSquared[0,0])
    for (int y = 0; y < Height; y++)
    {
        byte* blue = scanline;
        for (int x = 0; x < Width; x++)
        {
            byte pixelValue = *blue;
            *sums += pixelValue;
            *squares += pixelValue * pixelValue;
            blue += 3;
            sums++;
            squares++;
        }
        scanline += image.stride;
    }
}

Now all the multiplies for calculating array element effective addresses are gone.

Note that any sort of functional language or matrix library should have this sort of thing figured out..  Equivalent Matlab code would be:

blue = image{:,:}.b;

sums = sums + blue;

squares = squares + blue.*blue; % or blue .^ 2

This would have run at least 4-5 times faster than the original version, just because it traverses the arrays in the right order.  For raw number crunching, it’s pretty hard to beat a tool designed for the purpose.  But sometimes the tool is missing other features, like a powerful user interface framework.  You can do UI in Matlab but it isn’t much fun and won’t look as professional as Windows Forms or WPF.  Don’t let that be the only reason to use a general-purpose language though, with C# (or C++) we actually do squeeze out more performance than Matlab.  The reason is that the final C# code loops over the source array only once, while Matlab will twice.  Telling Matlab to do both calculations in the same pass just isn’t an option, since Matlab is an interpreted language you have to call its vector builtins and avoid rolling your own loops, otherwise we’re talking factor of 1000 or more slowdown.

There’s another potential performance killer present: cache collisions.  The code iterates through three arrays simultaneously.  If the cache mapping algorithm puts data from source and destination elements into the same cache location, each array would be evacuated from cache in order to access the other, causing continual cache misses again.  This won’t be as severe as the original problem, because the element size is different for the arrays, and modern CPUs have associative caches which mitigate the problem.  With three arrays, a 4-way (or more) associative cache avoids the need for elimination on collision (For reference, the Core i7 is 8-way associative in L1 and L2 caches).  With a low-level language like C++, you could move the arrays around in memory to avoid collisions.  Good luck trying that with Matlab, Java, or C#.  C++ could also take advantage of SIMD instructions to process multiple elements per clock cycle, Matlab and Java might do this automatically, not (on Microsoft’s runtime engine) C# or any other .NET language.

Finally, there’s a low ratio of actual work to looping in this code.  Partially unrolling the inner loop (say by a factor or 16) would improve throughput as well as making it easier to switch to SIMD.  This might speed up the code by another factor of 2-3.  Hopefully standard syntax for SIMD will be added to a future version of the C++ standard just like C++0x adds atomics which formerly were very platform specific, but for now use of SIMD instructions comes at the expense of portability.  Or vectorizing compilers may come to the rescue, for code which is written in a way that allows it.  Multi-threading could also increase throughput due to the inherent parallelism, but due to the overhead of starting new threads you’d want a thread-pooling task manager or to exchange the loop order again, this time bringing into play the frame iteration that I assume is implemented in the caller of this function.

Intel and AMD chip designers do a great job of transparent optimizations like prefetching and branch prediction that yield good performance on all kinds of code.  But at the end of the day, abstractions fall short and the programmer is responsible for understanding and making good use of the hardware.  That’s why writing code the right way is called “software engineering”.

Technical Documentation, like Code, does not follow English rules for grammar and style

The purpose of technical documentation is to precisely convey information to other people who possess the technical skills needed to apply it.  It is not to pass along technical skills, although reading technical documentation on a system designed by someone more experienced often will.  It is not to explain a system to management to enable their decision-making processes, that need is properly met by the inclusion of executive summaries and project plans in software specifications alongside the technical design.  And it is decidedly not to make the spelling and grammar check in your word processor happy.

Of course, documentation must include explanations in a human language such as English, and as such, broken rules of English grammar and style are distracting and doing so gratuitously should be avoided.  But the potential for conflict with accuracy should not be ignored.  In order to be clear when documenting code, it is necessary to follow the rules established by the programming language for identifying particular components.  Only where the programming language makes no restriction (such as in comments, or the English prose surrounding references to the software system) should literary rules reign.

Joseph Newcomer has a great anecdote of a programmer who tried to write C++ code using English grammar.  But documentation can become complicit in such foolishness if it disregards the rules of the language, as does the Apple Developer Documentation.  Here is an excerpt taken from Apple’s man page for select on 1 August 2009:

First, the prototype, which has been correctly preserved:

int select(int nfds, fd_set *restrict readfds, fd_set *restrict writefds, fd_set *restrict errorfds, struct timeval *restrict timeout);

Next, the explanatory content, in which accuracy has been sacrificed on the altar of English grammar:

DESCRIPTION Select() examines the I/O descriptor sets…

RETURN VALUES Select() returns the number of ready descriptors…

BUGS Select() should probably have been designed to…

I’m truly at a loss for why Apple should be describing the Select() function in a document dedicated to select().  No C compiler accepts the altered capitalization, and neither should programmers.  Oh wait, Apple was just following the example of FreeBSD who messed up a decade earlier.  Of course, FreeBSD is free, while Apple is selling their OS and therefore should be held to a higher standard.  For reference, the normative specification of select (and updated version) are case-correct while also managing to obey English capitalization rules.  This is the ideal.

Maybe next, in the interests of using a larger vocabulary to avoid boring repetition, the cat man page can be rewritten using words such as "feline".  head and tail likewise have perfectly good synonyms (in English only, /bin/sh isn’t accepting substitutes).  Just don’t ask me to build products on a foundation that plays such tricks with documentation.

User Interface

Strictly speaking, the EPCS main function of toggling valves at set intervals doesn’t require any user input, and it’s likely to be quite some time before an algorithm is embedded to provide real-time results, so a display is also not necessary, but a device without a user interface is an inflexible one. The development and debugging process will also be greatly eased by an integrated display.

I did some looking around on Digi-Key for a small inexpensive LCD panel, but what impressed me most about all of them is that the documentation was utter garbage. Then I stumbled across an entire front panel on fire sale. Ok, it isn’t perfect, I could have wished for 20 characters across and a backlight, more pushbuttons, or a green LED, but it’s more than adequate and the price was ideal. The seller’s comment "I can’t begin to make such an item for this price." is, if anything, an understatement.

Experience has taught me that electronics tend to fail during development (most circuits deal in at least a few votlages that are higher-than-absolute-maximum-rating for something else and with loose wires, incorrect cable pinouts, or short circuits induced by multimeter probes… use a little imagination) and I’ve learned the value of immediately-available spares, so I’ve been accumulating enough of each component to build two units  Actually I would much have preferred two complete units plus a spare of each component, but with free samples (often necessary to get around minimum quantity requirements) you take what you can get, which is usually two. For the front panel I ordered yet two extra for a total of four; this price point doesn’t cause me much pain and I’m sure I’ll find a home for them before too long).

Anyway, the supporting circuitry for this front panel module turns out to be really minimal, but that wasn’t immediately evident. Here are the issues I considered, addressed in order of fewest dependencies:

  • LEDs: The front panel has them wired in a common cathode configuration which is not my preference and the discussion is too long to fit here comfortable.
  • Momentary Pushbuttons: The front panel has four, with one contact of each grounded and the other floating, the perfect configuration for small numbers of buttons. Add a pull-up resistor and the floating contact becomes a nice logic-level signal, ready to feed logic. The ADuC7025 GPIO pins have internal 100k pull-up resistors so we could wire up the buttons directly and be done with it. I’ve chosen instead to use the MAX6818 de-bouncer IC, not so much for the de-bouncing which can be provided easily in software, but for the 15kV ESD protection and overvoltage protection, which can’t. The Change-of-State output is nice for generating an interrupt, even if the active state is wrong for direct use as an ADuC7025 external interrupt request line. I see two ways to overcome this: one of the unused channels of the hex inverter used for LED current sourcing, or the method I’ve chosen, the interrupt-generation capability of the ADuC7025’s integrated Programmable Logic Array. This is the more complex route, but enables tricks like automatic inhibition of the button-press interrupt while the shared data bus is in use talking to another peripheral. For this reason any output-enable signals will be placed on PLA input pins.
  • Character LCD: This has been just a tad frustrating. The front panel uses an original Hitachi HD44780 requiring a 5V supply, but the internet seems to only have the datasheet for the newer HD44780U which supports 3.3V supply as well. That’s ok for getting data back to the ADuC7025, which has 5V-tolerant GPIOs, but raises a question about whether the 2.4V high output voltage of the ADuC7025 (probably nearer 3V when run with a 3.3V supply, still less than 0.7VCC) will be considered logic high or an intermediate unstable high-current level. Anything above 2.2V is clearly ok for the HD44780U when run from a 5V supply, but what about the older model? Luckily someone posted the relevant parts of the datasheet and its threshold is also 2.2V. This means there is no need to increase the voltage with either common-drain configured GPIO and power-wasting pull-up resistors, or a bidirectional level translator such as the CD74ACT623 or SN74LVC4245A (another translator would have been needed for the control signals).
  • Shared data bus: We save pins by using the same data bus for the pushbuttons and LCD, but a compatibility check is in order. The bus has three devices, the ADuC7025, MAX6818, and HD44780. Do all have tri-state outputs? Yes. Are all tolerant of 5V signals from the HD44780? ADuC7025: Yes. It’s tempting to ignore the MAX6818 because it will be tri-stated, but this would be a mistake because the absolute maximum ratings would have been exceeded unless a 5V supply is chosen. The pushbuttons are fine at 5V, and 2.4V on the EN pin is still sufficient to be considered high. Alles ist in Ordnung.

Phew.

Common-cathode LEDs suck

Or sink. Whatever. No matter which word you use, there’s a need for the logic to source a current of 25mA (or reasonably close).

I definitely prefer LED banks with anodes tied to the positive power rail. NPN and NFET transistors are more efficient per unit area than their PNP or PFET counterparts, so interfacing LEDs usually can be done using microcontroller GPIO pins to sink the current (but be sure to check the rated sink current, the ADuC7025 makes no promises of more than 1.6mA), and a series resistor to make up the voltage difference. Open-drain logic outputs preferred but not necessary, the LED becomes high impedance as the cathode voltage goes high. Even when the microcontroller can’t sink enough current, NFET or NPN transistors are available in quad-packs (or more) with high current gain.

With common cathode, this is not so easy. Furthermore, the common cathode is already hard-wired to ground, but that’s the most reasonable thing to do with common cathode anyway. Since we need external components, my preference would be to go for a current-controlling LED driver. If we were using LEDs with the usual common anode/switched cathode arrangement, there’d be lots of options including programmable dimmers and so forth. But all hope is not lost, Exar makes "high-side" LED drivers with a controlled output current (PWM for dimming) with either two or three channels on a single chip. Problem: one shared enable for all channels. Huh, how’s that multi-channel then? Well, these chips are designed to match current (and hence intensity) between multiple LEDs. There’s no options for separate control or even single-channels, and if you only connect one LED the logic promises to be so kind as to detect a fault and turn off your good channel. There used to be a quad driver chip made by TI and National Semiconductor designed as the high side of a matrix scan system, the SN75491, that would have been perfect for this, but it is no longer sold. Abandon hope all ye who enter here.

So we’re left with using transistors for voltage control, combined with series resistors to intersect the 3V @ 25mA spec of the LEDs. Since there don’t seem to be any triple- or quad-pack PNP (or PFET) transistors, one option would be discrete transistors like the 2SB1694 or FDN338P. PFET is better, because the low current gain of PNP causes a respectable load remaining on the microcontroller. But I’m a proponent of low part count anyway, so a multi-channel inverter will be an even better option as long as it has appropriate supply voltage and current source capability. Several alternatives are TI’s SN74LVC3G04 and ON Semi’s MC74AC04. Neither option recognizes 3.3V as logic high, and the hex inverter has lower input and supply current requirements, so I’ll use it. The restrictive lower limit on logic high won’t be a problem, since the ADuC7025 is 5V-tolerant and bidirectional GPIOs can always simulate open-drain outputs by setting the data directionality bits while keeping the logic level low. A pull-up resistor to 5V completes the circuit.

Schematic of JTAG and serial-over-USB

As previously mentioned, the EPCS will take advantage of the FTDI FT2232H USB-serial converter and its integrated multi-protocol synchronous serial engine (MPSSE) support for JTAG for programming and debugging the ADuC7025 firmware. The second serial port provided by the FT2232H will be used in UART mode to provide a link for streaming data back to a PC. I envision two uses of this UART:

  1. The UART communicates via a RS-232 level converter and standard DB-9 connector to act as a generic USB-serial adapter attaching any RS-232 device to the PC. Simultaneously, the ADuC7025 snoops the streaming traffic, transparent to both endpoints. This enables aggregating the serial data with local ADC measurements using a single time source, streamlining later time alignment of different sensors.
  2. The UART may alternatively be used for bidirectional communication between the ADuC7025 and PC at high speed.

Since the FT2232H has two ports, creative use of tri-state buffers could allow both these modes of operation simultaneously when not debugging, but I do not feel this would be worth the added complexity. Instead we will focus on preventing leakage currents from flowing through the ground connection established by the USB cable.

To prevent leakage currents it is desirable to keep the EPCS main circuitry isolated from ordinary electronics (anything connected to the power grid without a medical grade power supply). The UART is straightforward to isolate — it has only two unidirectional data lines and moderate baud rate. I simply interposed an ADUM2201 "IEC60601-1 compliant digital isolator" and voilà, problem solved.

JTAG presents a little more severe problem for isolation, as there are both more pins and higher frequency signals, so that inclusion of a digital isolator would have a significant detrimental impact on the debugging experience. At the same time, firmware loading and debugging is not part of the intended use of the finished product; it is quite acceptable to debug using test data or during animal experiments and remove the JTAG capability before use with human patients. Therefore all that is needed it to provide a way to disconnect JTAG while retaining the UART link which is useful when deployed, and a removable resistor array à la SCSI terminator resistors solves the problem neatly.

Or does it? One lurking issue here is that with two power sources, and the almost certainty of cycling power to the EPCS while connected to its debugger, these JTAG signals could be energized while their receiver is not. Such a situation threatens latch-up, as parasitic P-N junctions in the receiver IC provide a path to ground with very low incremental resistance and a low turn-on voltage on the order of 0.7V. The 3.3V which the FT2232H would drive onto the JTAG bus would be plenty high to induce latch-up in many chips. Using a resistor array with non-zero values, such as the 47-ohm SIP selected, helps mitigate the problem by limiting the maximum current. Ordinarily protection diodes would be added to the circuit as well to assure that any current passes through a safer route than the parasitic P-N junctions, but the ADuC702x datasheet indicates that these microcontrollers tolerate up to +5.3V on input pins irrespective of supplied power, and likewise with the FT2232H, so it seems this is a non-issue. (Note how other pins on both chips are rated with respect to VCC or VDD, but the digital inputs are not restricted.)

Here then is the completed circuit:

JTAG Interface

Admittedly a higher part count than just plunking down a 20-pin ARM-standard JTAG connector as shown in the ADUM702x datasheet example application circuit, but much more capable. In addition USB 2.0 high-speed capability used to make quite a statement, but now USB super-speed is preparing for debut and promises to quickly become the logo du jour.

JTAG Programming/Debugging Adapter

Having a microcontroller in an embedded project doesn’t do much good until you are able to program it, because in contrast to fixed-function ICs or configurable ICs with reasonable defaults, microcontrollers have little to no useful behavior from the factory. Some families of microcontroller, like the Z-World Rabbit or Microchip PIC, have unique programming circuits so your choice of programmer is essentially made for you — you buy the adapter available from the IC vendor for several hundred dollars. Most ARM cores use the industry-standard JTAG serial port for programming, debugging, and (for other devices such as FPGAs) boundary scan test. That means a lot more options more adapters, but not necessarily much money saved. For example, For example, you can easily spend over $1000 on the Segger J-Link (pricelist in Euros) which is recommended by IAR who make the Embedded Workbench compiler/development environment. There’s a free limited version of the software and the J-Link is available for non-commercial use at a more palatable price, but still quite expensive and development of a product is forbidden, so apparently the lower price isn’t intended for open-source designers. Anyway, since I am developing a proprietary product, that isn’t for me.

However, further digging around on the Internet produced first hope and then a budget-friendly solution. There’s an open-source programming environment for JTAG named Open On-Chip Debugger, and some of the participants have put together designs for JTAG adapters along with lists of reasonably-priced commercial adapters. The DIY designs are mainly based on FTDI’s USB/serial adapter ICs, which I’ve run into on previous projects and, interestingly enough, fought to eradicate from those designs. The reason was that the Windows drivers were buggy and caused occasional blue screens of death (ok, in some versions, which even passed WHQL certification, the BSOD came fast and frequent) which was totally unacceptable for the intended use. For programming, reliability isn’t as important as the convenience, and while it’s hard to prove absence of bugs, the latest drivers are improved and definitely not a hindrance, so I come to an opposite conclusion regarding using FTDI chips for JTAG.

I still probably would have chosen to go with one of the less expensive commercial models rather than FTDI though, because designing a board (with noise-sensitive USB traces!) isn’t cost-effective by comparison, nor is placing a $6 chip on every board just to enable programming, but distributors sell not only the bare ICs but also ready-built modules which contain the USB connector, FTDI chip, EEPROM, crystal, all routed and assembled, for only $27 (Mouser gets the link because Digikey has none in stock at the time of this post). Including the connector for the mini-module is just as easy as the 2×10 connector used by the commercial adapters, the module can be moved around and reused between boards, this is the high-speed version, and to top it all off the second half of the FT2232H can be used as a UART to stream data to a PC in real-time. There’s still a slight question about what magic incantation to program into the EEPROM so that the software recognizes it as a JTAG device, but I’m confident that can be overcome even if I have to read the OpenOCD source code.

I’ve now posted the schematic.

Is that Instrumentation Amplifier needed?

While writing the post on selection of the microcontroller, I got a little deeper into the capabilities of the integrated ADC peripheral. Initially I had just been relying on past experience with Analog Devices ICs for assurance that if the basic specs were sufficient then my needs would be met. But now I’ve discovered that the ADuC7025 ADC provides adjustable scaling and common-mode rejection, which are two functions currently delegated to the AD8227 instrumentation amps. Yet I have a number of reasons for keeping the front-end amplifiers:

  • First, removing it wouldn’t save much. The IC is inexpensive, and the part count for supporting circuitry is only a couple components more than would be needed anyway, since the charge-sampling design of the ADC requires external capacitance.
  • The instrumentation amplifier presents a lower output impedance to the analog input pins, and its 400MΩ inputs also load the transducer less.
  • The output of the AD8227 is single-ended and can thus be made available as an analog output for recording with any other data collection apparatus. It may or may not be have a strong enough driver to use directly, but I’d add an additional buffer amp to stop the effects of the external load from propagating back to the internally measured signal. For example, when the external equipment is turned off, its protection diodes will shunt incoming signals to ground clipping at about 0.7V. However even with a buffer there could still be ground leakage currents, so ideally any analog output would be produced using a discrete DAC updated via an isolated serial signal.
  • Fewer pins of the ADuC7025 are needed, leaving more ADC channels available for other purposes (or use of the pins as GPIO).
  • Providing gain by setting VREF affects all channels equally, whereas with a separate amplifier a gain can be set individually for channels connected to pressure transducers, force transducers, current-sense resistors, or external analog signals, etc .
  • The AD8227 input pins are very well protected against overvoltage and ESD. This won’t help if somehow the input contacts experience the shock of defibrillation, but that wouldn’t happen unless badly miswired (to a degree comparable to sticking a screwdriver into a wall socket).
  • In case the maximum input voltage is exceeded, replacement of an IC with SOIC-8 footprint is far easier than the 64 lead quad flat pack of the ADuC7025.

So the instrumentation amplifiers stay.

Processor

A while back one of the embedded design newsletters I receive mentioned a kind of "super-ADC", barely more expensive than the usual ADC and about the same size but with a microcontroller embedded (for example, the ST7FOX, S08QD, AVR ATtiny25, and P89LPC90x are all 8 pins, while the AVR ATtiny10 has 4 ADC channels in only 6 pins), allowing for programmable digital filters right on the front end. Think 50/60Hz notch filters to take out noise caused by fluorescent lights (the power grid in Europe and much of the non-US world is 50 Hz, the US is 60, so many devices offer the user a choice). Of course that kind of filter doesn’t need to be done on the front end, but many filters do, such as software radio and CDMA, because they require a sampling rate too high to push through an isolator. Doing the digital signal processing in the ADC then decimating before transmitting as a low-rate easily isolated serial stream solves the problem neatly.

Now, the newsletter, which I can no longer find, wasn’t highlighting that small cheap microcontrollers have ADC, that’s been true for a long time. Rather it was focusing on the availability of integrated ADCs with the same performance as dedicated ADC chips. So when I started the initial design work for the EPCS, I decided to go looking for a super-ADC to control it. For some reason Analog Devices and ARM stuck in my mind, so I went looking there. The ADuC series turns out not to be quite as small or low power as the article had spoken of, but they do have ADC performance as good as a dedicated IC. Also, as I further considered the project I decided I wanted to interface with more devices anyway, so the slightly larger size wasn’t a disadvantage. Furthermore, the ARM7TDMI core is very power efficient for what it offers, the same core is used for cell phones and PDAs that run for days on a charge. In the past project I’ve been involved with have often used Analog Devices’ analog-to-digital conversion chips with great success, so I expect similar good results ADC-wise from the ADuC. I wanted good ADC resolution, low cost, and a respectable number of I/O pins for LEDs and buttons, and those requirements intersected in the ADuC7025.

ADuC7025 Functional Block Diagram

This processor has a lot of cool features and also presents a few challenges:

  • Power – Expect an upcoming post with gory details on getting suitable power and ground connections to the microcontroller.
  • JTAG – Supports programming the onboard code memory, but that’s not all. We won’t need an ICE (In-Circuit Emulator) or be reduced to printf debugging (or more usual for embedded systems, GPIO-attached LED debugging), because JTAG promises live debugging. Hooking up to a JTAG adapter, and then setting up the development environment will be topics for future posts.
  • UART – Just one UART is included, but RX and TX are multiplexed to two sets of pins. I’ll give more details on why that’s good in a future post.
  • GPIOs – Allows interacting with the real world in a myriad of ways. To start with, rather than just being a brick with tubes coming out, the EPCS will let its user know what’s going on with LEDs and a character LCD panel. We’ll even allow the care provider the illusion of control by accepting simple commands activated via push buttons. This will be described in — you guessed it — a future post.
  • GPIOs redux – The purpose of the EPCS isn’t pretty blinking lights and push buttons though, we want to actually control pneumatic flow. GPIOs provide the timing to the valve control circuit described here. (Wonder of wonders, already written)
  • ADCs – Actuators are great, but adding sensors opens up a whole new world of feedback control. We’ll start out with pressure transducers to monitor the fluid lines we’re controlling. Since the ADuC7025 supports fully differential inputs with an external voltage reference, we could use it as the front-end, but there are advantages to using an instrumentation amplifier as described here. For insight into system health we have an integrated temperature monitor, and maybe battery voltage and solenoid currents. That leaves a number of ADC channels left over which can be used to capture analog data from other monitors.
  • SPI – A future post will describe why and how I add a microSD card to the EPCS.

Well there’s plenty more work to do, but this post is long enough. On to more specific points of design.

Pneumatic Actuator

Since the whole purpose of the EPCS is to mitigate the need for a human to turn two stopcocks twice per minute, we need to replace the stopcocks with electric control and that means an solenoid pneumatic valve. The arrangement of the stopcocks was such that one port needed to communicate with either of two pressure reservoirs, thus a single 3-way valve is sufficient. I started googling for suitable valves and found lots of options in the $100 price range — $100 to replace a 50¢ piece of plastic. This wouldn’t have exceeded the budgetary constraints of the project, quite, but I knew that entire non-invasive blood pressure cuff systems (the kind for occasional personal use) sell for about $40 and during development work I’d rather have a spare than a really sturdy valve costing twice as much, so I kept looking and eventually found two models for about $16, call it $20 each once you factor in shipping. The catch is that the documentation available on the seller’s website is sorely lacking.  I took one approach to solving that — order the valves early enough to receive them in time for some simple testing before finalizing the design. Apparently if I had tried a little harder I would have found somewhat better documentation at the manufacturer’s site, but although this lets me confirm the required wattage I’m still uncertain about the actual electrical connections, I figure there will either be a separate drive coil for each of the two energized positions, or else a single pair of contacts which receives either +12V or -12V to reach either position.

(At this point I find the real explanation, which isn’t linked from the seller or even the manufacturer’s product page, and finally understand. I should have asked my ME and ChE sisters, who with a combined total of 13 years in engineering school, certainly already knew these little details.)

Aarrgh! Yes, I’m an electrical engineer, and I clearly didn’t know what 3-way normally closed (NC) valves were when I ordered them. Look, 2-way valves are easy, there are two ports and two positions, open and closed. If the valve is normally closed, then it’s closed when not powered, and open when energized. And if normally open (NO), it’s closed when energized and open when not. So far so good. But apparently things get complicated when you get to 3-way valves. There are three ports comprising one "inlet" capable of connecting to either of two "outlets", and presumably permitting flow in either direction despite the naming. That’s what I expected. I ordered a normally closed valve, because I thought that like the 2-way NC valve, there would be no flow unless energized. But no, although there are NO and NC variants for 3-way valves, there are still only two positions and both variants have exactly the same behavior, one outlet is open when energized and closed at rest, while the other outlet is closed when energized and open at rest. The only difference between the NO and NC valves is which of the two outlets is in the body and which is, well, elsewhere.

Well, the good news is that having only two positions, the valve is driven completely using a single solenoid coil either energized to +12V or off, no -12V required and no complicated (and power-hungry) voltage inverter. The other good news is that this valve will work. The bad news is that I have to keep the solenoid energized with about 5W for as long as I need pressure equal to the second reservoir, because as soon as the power stops the "normally open" reservoir becomes connected again. Oh, and yet one more piece of good news is that these valves, listed as "pneumatic" by the reseller, actually work with liquids as well, which is good because many of the lab’s experiments involve starting and stopping IV fluids, and flexibility is good. On the other hand, 0.9% saline might not qualify as "inert".

If I want to be able to pressurize the system and then hold the pressure by stopping all flow (both fluid and power), which would be a significant savings in power and thus improvement in battery life, I can use a pair of normally closed two-way valves or the equivalent 3-position 4-way valve which is what I had been expecting to have gotten to begin with. Both would use two solenoid coils at 0 or +12V, still no need for inversion.

Anyway, these valves require 12V high current "digital" control. Digital only in the sense that the signal should be either on or off, no proportional control (they make fancier valves for that), but "discrete" would probably be a better word since they are definitely not compatible with digital logic which outputs 3.3V at up to typically 25mA. I found a boost regulator which sources up to 18V (adjustable) output, plenty of current, and has a 3.3V-compatible digital enable input. In addition the efficiency is nearly 90% and quiescent current is very low at 75μA so this won’t be hurting battery life. I’d feel better if the input voltage range extended a little beyond 6V, but the maximum ratings indicate it’s safe up to 7V and it should be rare for a battery to be over its rated voltage. TI provides a very nice example circuit as well as detailed information on selecting the external components. I’ll use as many of the example values as possible and just adjust the output voltage to 12V, while using the more efficient lower frequency option.

Here is the circuit for up to two valves:

Pneumatic Actuator

Wow, that’s a lot of components for something so simple.  I can’t imagine how bad it would be without an IC doing the boost conversion, probably I would just use a 12V battery. I should ask my sisters that inertness question before I forget.

Power budget: 1mW continuous + 4.5W on 10% duty cycle. The duty cycle drops to 2% if I substitute the 2-way valves.