c0re_dump (c0re_dump) wrote,

Assembly Language FTW!

Someone at Dorkbot Bristol found an interesting project that's going on at the University of California at Irvine. It's called Digital Voices, and it's all about making mobile computers communicate using sound. Now, using sound for communication is not new; modems have been using the technique since the 1960s. The new idea is to make the sounds pleasing to the human ear, maybe to the extent that we hardly notice them. Some of the examples that have been tried so far sound like bird song or crickets chirping. Other, more obvious examples sound like R2-D2 or encode digital data into music.

The communications theory is all very technical, and we have a weighty tome by Sklar that goes into all the mathematics of it. With that book, one could design a 56kbps modem pretty much from scratch (given enough time to understand the maths). Being more of a code-hacking type, I had a look at the Java programs that the Digital Voices project had published. Imagine my horror when I discovered that they'd called both sin() and cos() in the innermost loop! In a piece of code that must work in real time to detect audio tones in a digitised sample! That would never do...

So, suitably charged with righteous indignation at the wasted processor cycles in such a program, not to mention the use of a high level language, I set about making it go faster. I won't rant about the use of a language that compiles into P-code, because some Java compilers don't do that, and they really are quite good nowadays -- and anyway, we're talking about calling a trigonometric function in the run-time library here, from a tight loop. No, I'm going to rant about pre-computing the trig stuff in advance and coding the tight innermost loop in assembler. Oh, I just did. OK.

The innermost loop in the tone detection program performs a convolution. That is, it multiplies the samples in the incoming audio with samples of a reference sine wave and cosine wave, then the adds them up. Hence, the process is often called multiply-accumulate. DSP (Digital Signal Processing) chips have special, very fast instructions to do multiply-accumulate, and even the ARM chip has one. But I want to use an Atmel AVR, the ATmega8 (or an ATmega168 or ATmega328, as found in the Arduino). Now, the AVR has an eight-bit multiply instruction that executes in two cycles, so maybe we have a chance of getting this to work.

AVR microcontroller board: LEDs plugged in

But first, I must acquire 8-bit audio data at something like an 8kHz sample rate. To grab 256 samples, it'll take 32ms, and that will occupy 256 bytes in RAM, or one quarter of the ATmega8's RAM. OK so far, the chip's analog-to-digital convertor can do that and I can double-buffer the data with a timer interrupt to keep the audio flowing through the program. So now I have 32ms to do all those multiplications and additions, 256 of them for the sine and 256 more for the cosine, for each frequency that I want to detect. Then, I have to square the results (two 32-bit multiplications) and add them together. If the result is bigger than some pre-determined threshold, I've detected a tone.

Scope Photo of Overlapped I/O

Well, I needn't have worried about the speed of the code, because my convolution program takes 550μs to process each tone. That's enough time to detect over 50 separate tone frequencies at once. I only need eight to begin with, for DTMF tone-dialling detection. It all worked well enough to take it to Dorkbot last week and demonstrate it with a signal generator, amplifier, microphone and preamp.
Tags: audio, avr, convolution, dorkbot, dsp
  • Post a new comment


    default userpic
  • 1 comment