Some of you might have heard about the new Cortex-M4 ARM Processor.
It’s a superset of the Cortex-M3, meaning it can run Cortex-M3 instructions while adding new ones. The processor is primarily targeted at DSP applications (which seems to be everywhere these days) but I suspect will be good for almost any application. Texas Instruments, NXP and Freescale have licensed the core, which promises an ample supply of choices for your next project. Another huge advantage of the Cortex-M4 is the addition of SIMD. If your data is 8 or 16-bit, you’ll be able to perform multiple operations in one clock cycle, giving a huge boost to many algorithms. The following table shows the difference in cycles for the several of the instructions:
Instruction | Description | Cortex-M3 | Cortex-M4 |
MLA | Multiply accumulate | 2 | 1 |
MLS | Multiply subtract | 2 | 1 |
SMULL | Long signed multiply | 3 to 5 | 1 |
UMULL | Long unsigned multiply | 3 to 5 | 1 |
SMLAL | Long signed accumulate | 4 to 7 | 1 |
UMLAL | Long unsigned accumulate | 4 to 7 | 1 |
Many algorithms will benefit from the single cycle instructions and even more from the SIMD instructions. There will probably be a lot of demand for those of us who still use Assembly.
For more information:
Cortex-M3 Technical Reference Manual