XMOS Board + Wavefront DSP Board

markseel · August 04, 2012, 04:58:32 PM

Some progress, some changes too. I quit working on it for a while but I'm about ready to get going on it again.

It will still support streaming stereo PCM audio to the PC in real time as well as audio effects processing using the CODEC's stereo analog in and out at 24-bit 48kHz.

1) I decided to remove the AL3102 parts and do the DSP on the XMOS. It has plenty of power (400 MIPS, single cycle 32x32=64 MAC).
2) I've been able to communicate with the Bluetooth Low-Energy part via a UART. I 'm also able to interact with it from the PC using the BLE GATT protocol. A colleage has been able to interact with it from iOS (test application running on the iPad and iPhone 4s).
3) I have the XMOS and support circuitry (FLASH, oscillator, Vcore switching vreg, Vio linear vreg, power-on-delay, XMOS MCU) on a separate 36-pad 19x19mm surface mount module - boards came last week, should be running next week.
4) I'm laying out the PCB that contains the XMOS module, Bluetooth LE module and the audio CODEC right now. It has connections for analog in and out, power, XMOS JTAG, and GPIO's for switches and rotary encoders used to select and adjust effects parameters. It's about 2.0" x 1.1".
5) I started some of the DSP code. Lots to do still.

I'll post some pictures of the XMOS module and the effects board soon.

markseel · August 04, 2012, 08:35:56 PM

Here's the latest boards. The effects board is on the left and contains the Bluetooth LE module, XMOS module, and the AK4556 CODEC and Linear Vreg. The two images on the right show the XMOS module that I'm building now for use in various projects.

The effects board (left):
The 11 pins at the top are for the voltage supply (3.6 to 5.5V) and the XTAG debugger (5 pins) and high-speed serial links (4 pins) to the PC (for streaming audio input if wanted).
The 6 pins at the bottom left are for ground, stereo input and output and a clean 3.3V for analog circuitry.
The 11 pins on the bottom right are for effects memory preset selection (3 pins for eight presets) and two rotary encoders (8 pins) for parameter selection and value changing.
The remaining two bins on the bottom right will probably be used for TX/RX for an OLED display or something similar.

markseel · August 07, 2012, 09:57:07 PM

Tested the XMOS module - works!

I also finished a fractional fixed-point base-16 exponentiation function ( y = 16 ^ (-x) ) that will be used numerous times in the overdrive/distortion effect. It supports up to 32-bit fractional precision using an optimized algorithm.

tp1936 · August 08, 2012, 02:22:05 PM

Impressive work!
Very interested in the Xmos stamp module. 2 layers or four layers? Are you planning on selling this or is it going to be selling these modules or are you planning to publish schematics for this?

Thanks

markseel · August 08, 2012, 04:38:46 PM

The PCB board is 19mm x 19mm, 2 layers, immersion gold finished (ENIG).

I'd like to make more of the XMOS stamps for my projects and for other hobbyists.
But I don't want to do the PCB assembly in very high volumes.
I'd be willing to put some together for others if the volume is low - they'd cost around $25 (PCB + parts + time).

I looked into having them built. I could have them assembled for just under $10 per stamp in volumes of 100.
Parts in volume would also cost around $10. So that's about $20 if I could produce these in larger numbers.
I'm not sure if it will come to that though. But again, if anyone wants one - let me know.

markseel · August 20, 2012, 07:33:31 PM

Here's a simple schematic for hooking up a stereo CODEC and the XTAG debugger.

mhelin · October 12, 2012, 09:14:37 AM

Mark, quick comments. First the XMOS stamp is a great idea, and I think you could get it assembled if you did it the way some people do in diyaudio.com: organize a group buy. Another thing is that you might want to use the new XS1-SU1 USB enabled part (which can't be easily soldered without machines) plus I think some flash memory should be put on stamp PCB for storing the program and NV data (which also means the usual pins used interfacing flash like in XK-1A kit are kind of reserved - noticed that you used them on the schematic above to connect to codec).

Here's the SU part data sheet:
https://www.xmos.com/download/public/XS1-SU01A-FB96-Datasheet%28X7199E%29.pdf

Not sure how the OTP can be used (I think only programmed once) but it might be possible to develop a firmaware for downloading new program and flash it to NV flash device (though arduino like loader in the beginning of flas memory should work fine).

Also it would be nice have one or two audio boards available later on - maybe a stereo and a multichannel ones, latter for USB audio recording and digital speaker crossover applications. Anyway, the most inexpensive XMOS board available costs $49 (without JTAG / programmer device) so $25 for a stamp isn't really much.

mhelin · October 12, 2012, 09:16:27 AM

Btw. seems someone else has also been planning XMOS stamp (don't know if there's been any progress):

http://solderpad.com/folknology/xs1-su1-stamp/

Digital Larry · April 27, 2013, 06:59:35 AM

Hi Mark,

Serendipitously ran into you at DesignWest last week in San Jose. Check out some sample screens from my upcoming "SpinCAD Designer" that I'm developing for the Spin FV-1.

http://www.spinsemi.com/forum/viewtopic.php?t=378

It would be a boatload of work, but this could be adapted to the Xmos chip.

Digital Larry

mhelin · December 03, 2013, 08:12:27 AM

Related to this topic there is now a new startKIT development board from XMOS. It's not yet available but when is it will cost only $14.99:

http://www.xmos.com/startkit
http://www.digikey.com/product-detail/en/XK-STK-A8DEV/880-1066-ND/4485710

They were also giving some for free, don't know if any are left though, or when they will delivered.

The startKIT is compatible with this add-on audio board:
http://www.digikey.com/product-detail/en/XA-SK-AUDIO/880-1043-ND/3622823

So together they make nice development platform for guitar FX, too.

There is also SDRAM add-on module, but to be able to use the audio and SDRAM modules at the same time you will need the bigger sliceKIT board:
http://www.digikey.com/product-detail/en/XK-SK-L2-ST/880-1041-ND/3622821

Well, if someone from XMOS ever googles him/herself to this page please consider introducing above kind on kit with the JTAG debugger, SDRAM and audio cards instead of the GPIO and Ethernet cards.
You can of course get all cards and the base card in single items. The core board is just a little bit expensive if you haven't got the JTAG yet:
http://www.digikey.com/product-detail/en/XP-SKC-L2/880-1042-ND/3622822

markseel · December 12, 2013, 09:23:35 PM

Good points mhelin. I've used XMOS for a few years now for both professional work as well as hobby stuff. I thought the micros and tools were so good I went to work for XMOS (eight months now)!!!

I'll add to your comments while trying to stay objective

Just for background here's some terminology and facts related to XMOS micros:

1) An XMOS micro-controller constists of xCore tiles.
2) An XMOS micro-controller parts currently available contain one tile or two xCore tiles clocked at 500 MHz.
3) Each xCore tile contains 4, 6 or 8 independent 32-bit CPU's, 64K SRAM, 32x32=64 multiplier, JTAG interface, timers, clocks, smart GPIO's, 8K OTP memory.
4) Each 32-bit CPU communicates to other CPU's via message passing channels. You can also use a shared memory approach if desired.
5) The CPU's on each tile are round-robin scheduled and can idle (wait for a timer, message or GPIO event) which frees MIPs for other cores.
6) The 32x32=64 multiplier runs at 500 MHz.
7) Single-tile parts offer 500 MIPS distributed across the 4, 6 or 8 cores
8 ) Dual-tile parts offer 1000 MIPS distributed across 8, 10, 12 or 16 cores
9) There are no I2C/SPI/I2S/UART/etc peripherals - all peripherals are defined by software
10) The GPIO's are high speed and support timed input/output, serialization/de-serialization, time-stamping, etc
11) 32-bit cores + Smart GPIO's = Just about any peripheral you can dream up
12) Some XMOS parts (the U-series) contains a USB PHY supporting USB 2.0 HS

OK, that was a lot of info. Hope it makes sense so far.

Typically the firmware development environment consists of xTIMEcomposer (the programming IDE supporting editing, compiling, simulation, timing analysis, loading/flashing, debugging, etc), a USB JTAG emulator called XTAG ($19 from Digikey) that supports high-speed low-latency debugging, and your target board containing some number XMOS micro-controllers.

The startKit is a bit different though - it contains a two-tile part (remember each tile has 4/6/8 cores) and does not need a separate XTAG board. The device on the startKit is a two-tile 16-core device. The first tile is actually used as the XTAG and the second tile is used as your target micro (resulting in you having an 8-core micro-controller for your applications). So it's equivalent to using a separate USB/XTAG board and a single-tile XMOS micro-controller. But with the startKit it's all on one board - all you need is the startKit ($14.99), a USB cable (comes with the kit), the freely available xTIMEcomposer development tools, and the free software (called xSOFTip) downloadable from www.xmos.com!

http://www.xmos.com/en/startkit#D1WEaWZP

As mhelin mentioned there's free software to implement an SDRAM interface, I2C/SPI interface, I2S/TDM interfaces for ADC's/DAC's/CODEC's, LCD controller, Ethernet interface, TCP, and lots of other stuff. You can download the xTIMEcomposer (use version 13beta for the startKit) for free and use the xSOFTip browser viewer within xTIMEcomposer to view all of our free software components. You can them just drag them right into your project!

I think the startKit would be a great foundation for effects. It has a rasberryPi connector as well as a PCI-e slot for adding your own expansion board (for pots, your CODEC, and what-not) or for adding XMOS slice boards. There's slice boards for Audio and Ethernet interfaces among others.

Some more cool stuff ...

If you'd like to start developing applications for an XMOS micro-controller and you don't have a board then download the free tools and use the simulator! The xTIMEcomposer IDE can simulate your code and is 100% clock accurate (due to XMOS's proprietary and fully deterministic 32-bit cores). From within the IDE you can use the logic analyzer and oscilloscope functions to view results of your simulations as well as time your code and check for meeting real-time constraints. You can also use the oscilloscope view to observe your code running on a target board in real-time via the high-speed XTAG.

Digital Larry · December 14, 2013, 12:51:02 PM

Hi Mark,

The Xmos chips look pretty cool, and the price on the demo boards is ridiculously low.

I have a question about how best to use the multiple cores. My DSP experience up to now mostly consists of programming the FV-1 where there aren't very many variations on what you can do and you're limited to 128 instructions per effect patch. This simplicity comes with some benefits of course.

With a DSP system inevitably you are compelled to do "X" per sample period and this is where using multiple cores gets a bit confusing for me. I presume that access to cores can come via some sort of hidden (that is, built-in) process or thread scheduler. So how do you make it work more like a dedicated DSP chip where most of the time, I am just wanting to execute the same instructions once per sample, while other things (scanning pots or programming ports) occur at a slower rate? Do I have to use 8 cores or could I (if the application permitted it) simply use fewer cores? I presume that only one core can actually be executing at a time so the main benefit of having many cores is support for threading rather than multiplying available CPU cycles.

Thanks,

DL

mhelin · December 18, 2013, 10:18:22 AM

I see it (Mark sure know this better) so that the the purpose of cores is just to add parallelism. You can think the cores are like threads, or if you know the Intel Hyper-threading
(http://www.intel.com/info/hyperthreading) I'd say it's very close to it.

Now regarding the DSP, why not look at an example:

Here is the main function for biquad processing example (@xcore git):
https://github.com/xcore/sw_audio_effects/blob/master/app_slicekit_biquad/src/main.xc

Code Select


int main (void)
{
       streaming chan c_aud_dsp; // Channel between I/O and DSP coar
        par
        {
                on stdcore[AUDIO_IO_TILE]: audio_io( c_aud_dsp ); // Audio I/O coar
                on stdcore[DSP_TILE]: dsp_biquad( c_aud_dsp ,0 ); // BiQuad filter coar
        }
       return 0;
}

Btw. what is "coar"? Never heard...

So here inside the par block the "on stdcore\[x\] ..." statements are executed parallel to each other, and they communicated using the channel c_aud_dsp.

Now look at the dsp_biquad function:
https://github.com/xcore/sw_audio_effects/blob/master/app_slicekit_biquad/src/dsp_biquad.xc

Code Select

void dsp_biquad( // Coar that applies a BiQuad filter to a set of of audio sample streams
        streaming chanend c_dsp, // DSP end of channel connecting to Audio_IO and DSP coars (bi-directional)
        S32_T biquad_id // Identifies which BiQuad to use )

In that function after initialization you'll find a while(1) block which executes forever. It could be simplified like this (leaving out the state handling):

Code Select

// Loop forever
        while(1)
        {
                // Send/Receive samples over Audio coar channel
#pragma loop unroll
                for (chan_cnt = 0; chan_cnt < NUM_BIQUAD_CHANS; chan_cnt++)
                {
                        c_dsp :> inp_samps[chan_cnt];
                        c_dsp <: out_samps[chan_cnt];
                }

                samp_cnt++; // Update sample counter

                // Do DSP Processing ...
                process_all_chans( out_samps ,inp_samps ,biquad_id ,NUM_BIQUAD_CHANS );
      }

Now that process_all_chans() function refererred there is the one which actually does you the DSP for the input samples.

The actual bi-quad implementation can be found here:
https://github.com/xcore/sw_audio_effects/blob/master/module_dsp_biquad/src/biquad_simple.c

It is DSP code written in plain C.

Now, if you can and want to use parallel threads, I mean cores, to do something useful for you DSP application just repeat the above. One example could be reading the controls, or implementing delays (allpass filters) for reverb effects etc. Here's btw. an example from the reverb main application (https://github.com/xcore/sw_audio_effects/blob/master/app_slicekit_long_reverb/src/main.xc ):

Code Select


int main (void)
{
        streaming chan c_aud_dsp; // Channel between I/O and Processing coars
        streaming chan c_dsp_eq; // Channel between DSP-control and Equalisation coars
        streaming chan c_dsp_gain; // Channel between DSP-control and Loudness coars
        chan c_dsp_sdram; // Channel between DSP coar and SDRAM coar

        par
        {
                on stdcore[AUDIO_IO_TILE]: audio_io( c_aud_dsp ); // Audio I/O coar
                on stdcore[DSP_TILE]: dsp_sdram_reverb( c_aud_dsp ,c_dsp_eq ,c_dsp_gain ,c_dsp_sdram ); // DSP control coar for reverb
                on stdcore[BIQUAD_TILE]: dsp_biquad( c_dsp_eq ,0 ); // BiQuad Equalisation coar
                on stdcore[GAIN_TILE]: dsp_loudness( c_dsp_gain ); // non-linear-gain (Loudness) coar
                on stdcore[MEM_TILE]: sdram_io( c_dsp_sdram ); // SDRAM coar
        }
        return 0;
} // main

Even without seeing the other parts I would guess that c_aud_dsp channel delivers the input and output samples, c_dsp_eq channel is used to send data to and receive data from the equalizer core, c_dsp_gain same for setting gain and the c_dsp_sdram channel propably writes data to and reads from the SDRAM (using some internal protocol I guess).

Now I don't get it why the core index names are using postfix _TILE. I know that the terminology was changed at some point, was it so that earlier cores were tiles and the other way, don't remember exactly. Don't know why now just use terms like "core" containing or supporting "threads" which the "cores" actually resemble more than cores. Takes some time to get into.

Mark, btw, what happened to your python tools for XMOS, could you use these in your current position or is the NextAudio DSP stuff just forgotten and buried (or waiting for better times to reintroduce)?

Digital Larry · December 20, 2013, 09:27:39 AM

Hi mhelin,

Thanks for putting the effort into your response. Of course I am jumping to conclusions, but it seems that assigning a specific function, e.g. reverb, to a specific core is simply for notational convenience. I read that core (coars?) coarses?

are scheduled using a round robin technique, so as I understand it, each one executes for awhile and then the switch flips over to the next one. So even if the code is structured for things to run in parallel, they don't REALLY execute at the same time - simply each process has its own context making things neater and more self contained without a lot of overhead for context switching. Again, that's a question.

One thing I don't quite grasp here - in my limited FV-1 view of things, one block leads into another and all blocks need to be processed per sample period. So if the EQ is executing parallel to the reverb, EQ's result won't be ready for reverb's input until the next sample. I probably glossed over it, but how are the blocks connected together and is there a resulting need to offset functional blocks this way? So if 6 cores were executing DSP code, which logically was connected serially one into the other, would you have this problem, or would you be advised not to do it that way?

I was just looking at "OpenStomp" http://howleraudio.com/frontpage/ unfortunately the Forum is offline

. This uses a Propellor 8-core chip, each operating "simultaneously at 80 MHz". And I'm still wondering whether multi-core gives me something for the DSP code itself that I don't get otherwise. I can certainly see separate cores being used for I/O and LFO generation, along with a shared memory space where things can be exchanged through cores.

The end result of all this prattle is:
1) I'm getting close to the point with SpinCAD Designer (currently supporting FV-1 only) where people will be able to add their own functional blocks, so what do I do next?
2) The XMos dev kits are so inexpensive I'd be an idiot to worry about that aspect (cost)
3) I am probably going to grab one of these and an audio slice and just see what happens from there. Because I do suffer from occasional frustration at the 128-instruction limitation of the FV-1. Sort of like being a painter but always having to fit your ideas on a 3 x 5 inch card.
4) I'm also quite interested in starting to use a platform that looks like it has a way forward. I don't think that given Keith Barr's unfortunate demise in 2010, Spin is busy cranking out a next generation chip. I just don't see that happening, sad to say. But I AM pretty motivated to try to offer something to people so that they can do DIY on a DSP platform without having to learn DSP or programming.

mhelin · December 20, 2013, 04:46:25 PM

Quote from: Digital Larry on December 20, 2013, 09:27:39 AM
Hi mhelin,

Thanks for putting the effort into your response. Of course I am jumping to conclusions, but it seems that assigning a specific function, e.g. reverb, to a specific core is simply for notational convenience. I read that core (coars?) coarses? are scheduled using a round robin technique, so as I understand it, each one executes for awhile and then the switch flips over to the next one. So even if the code is structured for things to run in parallel, they don't REALLY execute at the same time - simply each process has its own context making things neater and more self contained without a lot of overhead for context switching. Again, that's a question.

Yes, each core is just guaranteed a minimum performance. For a 500 MHz device it's at most 125 MIPS/core if you are using four cores. The I/O is handled using a 100 Mhz clock, so it is a little bit slower.

Quote
One thing I don't quite grasp here - in my limited FV-1 view of things, one block leads into another and all blocks need to be processed per sample period. So if the EQ is executing parallel to the reverb, EQ's result won't be ready for reverb's input until the next sample. I probably glossed over it, but how are the blocks connected together and is there a resulting need to offset functional blocks this way? So if 6 cores were executing DSP code, which logically was connected serially one into the other, would you have this problem, or would you be advised not to do it that way?

Now sure if I really understood your question, anyway, some stuff doesn't seem to be quite parallel to me but more like serial. The tasks (=functions called inside par statement) are run each in own logical cores, but they also are synchronized. So the reverb main task is waiting for the the eq results before it can continue, but as you pointed it just receives the results of computations of previous sample. That's why each DSP task starts by reading channel input and writing the output:

Code Select

// Service channels in chronological order
                        c_aud_dsp :> inp_set_s.samps[chan_cnt]; // Receive input samples from Audio I/O coar
                        c_aud_dsp <: out_set_s.samps[chan_cnt]; // Send Output samples back to Audio I/O coar

Quote
I was just looking at "OpenStomp" http://howleraudio.com/frontpage/ unfortunately the Forum is offline . This uses a Propellor 8-core chip, each operating "simultaneously at 80 MHz". And I'm still wondering whether multi-core gives me something for the DSP code itself that I don't get otherwise. I can certainly see separate cores being used for I/O and LFO generation, along with a shared memory space where things can be exchanged through cores.

The end result of all this prattle is:
1) I'm getting close to the point with SpinCAD Designer (currently supporting FV-1 only) where people will be able to add their own functional blocks, so what do I do next?
2) The XMos dev kits are so inexpensive I'd be an idiot to worry about that aspect (cost)
3) I am probably going to grab one of these and an audio slice and just see what happens from there. Because I do suffer from occasional frustration at the 128-instruction limitation of the FV-1. Sort of like being a painter but always having to fit your ideas on a 3 x 5 inch card.
4) I'm also quite interested in starting to use a platform that looks like it has a way forward. I don't think that given Keith Barr's unfortunate demise in 2010, Spin is busy cranking out a next generation chip. I just don't see that happening, sad to say. But I AM pretty motivated to try to offer something to people so that they can do DIY on a DSP platform without having to learn DSP or programming.

Regarding 1) and 2) you could really consider adding support for XMOS chips. However, I wouldn't just rely on just XMOS kits - we dont know how long the StartKit will be sold for an example. It's so inexpensive that it can't make much money to company. Also the 4-channel Audio Slice isn't maybe the best interface for guitar players. I just today though of designing a board with simple a ADC and DAC (using cheap and simple Wolfson converters WM8783 ADC and WM8727 DAC) with one or half Mbytes of SRAM (8-4x Microchip 23LC1024) for delay/reverb FX. It would be better to make a board with XMOS chip + converters + SRAM + pots (the XMOS analog devices come with A/D's) just like Mark was planning to do, but it should be manufactured using pick-and-place machines etc. because it's no more diyable to solder the BGA parts.

Anyway, apart from the concurrency stuff (https://www.xmos.com/en/support/documentation?subcategory=Programming%20in%20C%20and%20XC&component=14806&page=2) and I/O the XMOS microcontrollers / CPU's seem to be quite usual ones. I don't know though (haven't tested) what happens if you have more parallel tasks for execution than there are available cores - my quess is that the compiler raises an error in that case. So you will have a problem there what to do if there are more tasks (SpinCAD processing blocks) than there are available resources (cores). Of course you can add cores by linking two or more boards together (adding links) or using a device with more tiles. Some kind of modular design would fit nice - one board could handle the I/O's and main structure (running the main.xc) and the linked boards (extenders) would then add processing resources.

Digital Larry · December 21, 2013, 11:52:54 AM

Ok, regarding the Propellor CPU, 8 cores at 80 MHz each = 640 MHz total seems reasonable.

Regarding the synchronization of DSP tasks running in parallel, where e.g. the reverb takes the output of the EQ from the previous sample period -

1) I doubt it makes any audible difference until you have several dozen such delays.
2) There's also no reason not to simply combine the DSP and EQ code and run them serially on the same core. Parallelism for its own sake isn't buying us anything here, that I can see - unless the architecture forces us to do it.

I can't remember where exactly I saw it, but I believe I saw a screenshot of an effects design software tool for a multi-core based design, where it seemed you were compelled to organize the effects along the lines of the CPU multi-core architecture. And I think this may also be true for the Xmos chips, and this is what I'm trying to find out. Let's suppose I'm trying to devote a single core to the bulk of my DSP processing because, independent of the architecture, my algorithm is simply linear, but too long to execute in 1/8th of a sample period. It seems like I still may need to chop it into smaller pieces because I can't turn unused cores off, and the round robin scheduler will still switch to those cores even if they aren't executing any code.

SpinCAD Designer, being targeted for the FV-1, knows nothing of cores and is designed to just spit out one combined blob of DSP code which is result of resolving references and offsets to the smaller functional blocks that were assembled in the editor UI. To move from that approach, to one where I have to split things up so that each core gets a roughly equal set of instructions to execute, adds a level of complexity that wasn't there before. So before I entertain that notion I want to be sure it's really necessary.

mhelin · December 23, 2013, 03:32:41 PM

In XCore for the scheduler there is a set of runnable cores (things previsously know as threads), and for an example cores that a waiting for an event are excluded from that set. The runnable cores are then scheduled to execution pipeline which can simultaneously perform multiple different type of instructions (register write, read, ALU operations etc.). Anyway the execution pipeline can containg at most one instruction / core. However, it's still not guaranteed that if only a single core is running it would be all execution time - only 1/4 of available processing time is guaranteed if there are equal or less that four cores active (=parallel tasks running), but it might actually be more than that.

https://www.xmos.com/download/public/The-XMOS-XS1-Architecture%281.0%29.pdf

"The set of n threads can therefore be thought of as a set of virtual processors each with clock rate at least 1/n of the clock rate of the processor itself. The only exception to this is that if the number of threads is less than the pipeline depth p, the clock rate is at most 1/p".

Now replace every "thread" with "core" to use the terms of today (guess decided by XMOS marketing team).

Here is another good document about performance:

https://www.xmos.com/en/download/public/DSP-performance-on-XS1-L-device%28X7424A%29.pdf

Regarding for an example FIR filters it is possible to divide the FIR (direct form) to several cores (just divide the coefficients in filter to equal parts and sum the output). That could be used for an example in guitar amp modelling and specially at speaker cabinet modelling.

markseel · December 23, 2013, 07:40:22 PM

Good questions DigitalLarry and great answers mehlin.

As far as the multi-core scheduling and MIPs distribution go look at it this way; each 32-bit core that has work to do (i.e. is not blocked by a pending I/O event, incoming message, or timer expiration) will be part of the scheduler's run-set. The max number of cores in a run-set is eight and the minimum is four. That's why each core ranges from 62.5 to 125 MIPs for a 500 MHz device. If only one core has work to do the run-set count is still four; one being a core and the others being empty. There's a good reason for this lower bound of four; I believe it's due to a four stage pipeline architecture in which pipleline stages for each core are staggered one slot apart. This allows RAM, the I/O ports, the 32x32 MAC, etc, be shared hardware blocks amongst the 32-bit cores. So RAM can be accessed at 500 MHz but the actual access per core is staggered due to each core being at a different stage in it's own pipeline. At least that's how I think it works. Anyway, it's a simple and elegant architecture that's clock-level deterministic and fully simulatable while being true multi-core and supporting apps that span multiple cores.

DigitalLarry asked a good question that seems to be rooted in looking for the simplest way to implement a DSP system with algorithms chained together such that data flows through a system of arbitrarily defined processing blocks. If all I had to do was execute a collection of DSP algorithms linked together in some dataflow-like architecture than a single 500 MHz core would be the simplest way to go. Having to break up the system into separate parts just adds complexity with no value; where to separate tasks and synchronize them, pass data to/from each task, how to load-balance the cores and utilize all MIPs, etc. But the value of the XMOS XS1 is to allow for often times very unrelated processing to occur while guarenteeing MIPs and providing deterministic and low-latency event handling. So a system with some DSP, some communications protocols, a user interface, maybe some PWM or I2S/SPI/I2C/Uart peripheral drivers, would be a good fit.

Some benefits that I personally enjoy with XMOS are less talked about, more subjective, and due to personal experience/preference. I don't like to try to use ARM/PIC32 peripherals after using XMOS - they require understanding of the peripheral register sets (many registers each having many bit-fields) and you have to work through peripheral-to-pin assignment and pin function multiplexing. I also grew tired of interrupts for hard real-time work - too much event handling jitter. Pulling in an RTOS added nice capabilities for concurrency (pseudo concurrency) but also adds complexity and an additional learning curve. You end up with a complicated ecosystem; a 32-bit MCU, someones tool-chain, lots or peripheral registers, perhaps a peripheral library, an RTOS and associated documentation, and the requirements to make sure all of those work together nicely.

My experience with XMOS (on the job) was completely different. The RTOS primitives are part of the hardware - they're already there and are easy to use (threads/cores, message passing, round-robin scheduling, event handling). But you only get 8 cores/threads so that's a challenge sometimes. The I/O ports require some work to understand but after that the door's wide open - you can do just about anything. Peripherals are now defined by source code rather than a bunch of register writes and interrupt handlers - this is way easier (for me anyway) to debug and assess intended peripheral protocol/behavior. And new or custom protocols are always an option. The tool-chain is multi-core and XS1 architecture aware. A PIC32 or ARM Cortex M4 will have a bit less power consumption than an XS1 (I bench-marked FIR's on all three systems) for fixed-point DSP. A TI or Analog Devices DSP will do even better. Built-in peripherals on ARMs/PIC32s are pretty much guarenteed to work once you get them set-up right and will consume less power than an XS1 soft peripheral. But they can't be customized. An ARM/PIC32 with an RTOS is a fine way to go but you have to consider the IDE/toolchain, RTOS, MCU/DSP and the peripheral library (if you use one) compatibilities and learning curves. After it's all said and done I was able to get a project up and going with XMOS chips and tools faster than any other 32-bit micro-controller platform that I've used. Shorter bug-fix cycles, better debugging tools, less real-time unknowns. OK, I'm done.

The sliceKit is only $15. I think we should make an adapter for it that sports a decent audio CODEC, guitar-level input/output circuitry, some LED's, some pots, etc. Plus the sliceKit has a place to plugin a RasberryPi (could be used for effects control, status display or something like that) if you're into that sort of thing.

Digital Larry · December 26, 2013, 08:33:15 AM

Thanks for the detailed remarks Mark! After I dig myself out of the post-holiday clutter I'll see what happens 8^).

DL

News:

XMOS Board + Wavefront DSP Board