What’s the performance?
QN9080 Fusion Signal Processor


QN9080 has Arm Cortex-M4F core integrated, DSP instructions are available for such algorithm computation.

QN9080 has a co-processor integrated as well which is called Fusion Signal Processor (FSP), it is suitable for processing an algorithm for sensor data.

Thanks to FSP, the sensor algorithm processing can be done in parallel with a high priority task like Bluetooth stack processing. FSP is actually much faster than Cortex-M4F DSP instructions.

How fast is it, then?

Ok, I am gonna show you the performance of QN9080 Fusion Signal Processor.

What processing’s are done for sensor data?

In general, what will you need for using sensor data?

Data of sensor is basically analogue data. And, usually it is converted to digital signal by using  Analog-to-digital converter.

The converted digital signal however can not yet be used as-is because it contains much of noise.

The noise needs to be filtered out.

As well as a the process like filtering out a high frequency noise of sensor, you might also need for example an integral computation when you calculate a speed from accelerate sensor data.

Other than filtering process, integral and differential computation, when it comes to a frequency analysis, it quite often uses FFT(Fast Fourier Transform ).

These algorithm of such filtering and FFT and so on, they quite often use so-called mathematical function like sine, cosine and a power, matrix and floating point calculations.

There is CMSIS-DSP, too!

CMSIS_logo
Reference:ARM CMSIS driver page

URL: http://www.keil.com/pack/doc/CMSIS/DSP/html/index.html

For mathematical functions line sine, cosine and a power etc,  you can use mathematical built-in functions of C standard library which includes in math.h

You can also use CMSIS-DSP library instead of C standard math library when the device is ARM Cortex-M core.

Recommend to read
Previous article(Let’s use CMSIS! Easy way to measure cycle count. )talked about CMSIS-CORE. It shows you the easy way to measure cycle counts using SysTick timer.

CMSIS-DSP is already optimized for Cortex-M core

What is CMSIS-DSP? it is a library of a common signal processing functions optimized for Cortex-M cores. And, it is faster to process than C standard math library(math.h).

CMSIS-DSP even supports Cortex-M0/M0+/M3 as well as Coretx-M4F, although DSP instructions is not supported.

If DSP instructions and FPU is not supported like Cortex-M0 core,  of course the code size is bigger and the processing speed is slower than M3/M4F.

Then, which is faster, FSP or Cortex-M4F using CMSIS-DSP?

First, I am gonna explain the how-to use CMSIS-DSP on coretex-Mx core and Fusion Signal Processor.

Then, I will try to get them to work and benchmark those cores.

How to use CMSIS-DSP

1. Include arm-math.h

arm_math_include
Includes arm_math.h

For the use of a mathematical functions like sine and cosine, you might want to use C standard math library, you need to include math.h.

Now, if you use a mathematical functions available in CMSIS-DSP, you need to include arm_math.h. It is quite easy!

2.Put CMSIS-DSP library in the project

CMSIS_DSP_library
CMSIS-DSP library needs to be set in linker setting.

The CMSIS library itself is stored in below  folder.

SDK package folder/libs/arm_cortexm4lf_math.a

You need to specify the library in Option-Property-> C/C++Build -> MCU linker – libraries. See above picture. Here you need to specify the library name, “arm_cortexm4lf_math” removing  .a.

For the search path, you need to put above CMSIS-DSP library path. You might as well refer to the above picture.

3. Enabling hardware floating point (FPU)

FPUのイネーブル
FPU setting

When the core is Cortex-M4F, Hardware floating unit (FPU) is available. Once FPU is enabled, you can use FPU unit.

You need to set “Architecture” to “FPv4-SP(Hard ABI)” to enable it.

The same setting is in Assembler and Linker settings.

CMSIS-DSP is included in MCUXpresso SDK (if you select to add it as software components. And, the required settings are done already.

How to use Fusion Signal Processor (FSP)

FPUヘッダーインクルード
Include FSP header file

All you have to do to use FSP is to include fsl_fsp.h.

No library is needed in the project like you did CMSIS-DSP.

Now that you can call FSP API, FSP_xxx() functions.

For Your Information
FSP API looks like FSP_xxx_cmsisApi(), though it is not perfectly same, but quite similar to CMSIS API. FSP_xxx is added as a prefix at the head of API function.
Due to that, It is relatively easy porting from CMSIS-DSP to FSP math function.

For the detail explanation of FSP functions, they are listed in its documentation. 

Benchmarking

FSP sample code is available in QN9080 SDK which you can download from MCUXpresso SDK web page. Here is MCUXpresso SDK page ( https://www.nxp.com/mcuxpresso)

In the sample code, there are FFT, Matrix and Power calculation being used and measured the cycle counts for those cases.

Here is the sample code. The part of a power calculation is only shown below.


void power_example(void)
{
    uint32_t fsp_cycles, mcu_cycles;

    for (uint32_t i = 0; i < 256; i++)
        power_acc_input[i] = 0.1 * i;

    s_se_done = 0;
    // FSP
    START_COUNTING();
    FSP_PowerIntF32(DEMO_FSP_BASE, power_acc_input, 1024);
    while (!s_se_done)
    {
    }
    FSP_GetPowerIntResultF32(DEMO_FSP_BASE, &power_acc_fsp_output);
    fsp_cycles = GET_CYCLES();

    // MCU
    START_COUNTING();
    arm_power_f32(power_acc_input, 1024, &power_acc_mcu_output);
    mcu_cycles = GET_CYCLES();
    PRINTF("-- Sum of the squares of the 1024 elements\r\n");
    PRINTF("FSP %d cycles, MCU %d cycles\r\n", fsp_cycles, mcu_cycles);
}

Measurement result

実行サイクル数計測結果
Measurement results of FSP and CortexM4F(CMSIS-DSP) cycle count.

From this result, it gives you the fact that FSP is much faster than Cortex-M4F using CMSIS-DSP on each calculations.

Especially, for FFT algorithm, it is about 7 times faster, for a power calculation it is about 6 times faster.

FSP is a co-processor with Cortex-M4F so that it can off-load the work from core’s work load.

 

Summary

This time, I measured the performance of Fusion Signal Processor that results in much faster calculation than Cortex-M core. It is about 6 – 7 times faster in FFT or power calculation.

Low power, BLE wireless connectivity and sensor algorithm are required in a small body wearable devices. QN9080 indeed meets all the requirements for such devices.