Amazing performance of i.MXRT!
How-to Coremark implementation and measurement


I’m going to measure a performance of my favorite board of i.MXRT1050, and will show you how to implement coremark to measure the performance. i.MXRT1050 is a cross-over processor and best performance in NXP’s microcontrollers.

Coremark is a tool for a benchmark to help you to select the MCU that meets your system requirements by properly comparison of the performance of various MCU’s.

You can measure the core performance so easily as the coremark can be run on any MCU’s. This time,  I got a i.MXRT1050-EVK board, and I wanted to know what on earth its performance is like.

You should be surprised at the amazing performance!

EVK board to be measured

I.MXRT1050-EVK
NXP i.MX RT1050-EVK board

As I already mentioned, I used i.MXRT1050-EVK. Actually, the device marking is printed on the device as i.MXRT1052.

i.MX RT1050 specification

Here are the specification of i.MXRT1050 and features.

Processor core ARM Cortex-M7 (FPU double preceision)
Clock frequency  Max. 600MHz
Flash Memory None
RAM TCM-memory 256kB
Supported External Memory SDRAM, SRAM, NOR,NAND, QSPI Flash

 

Program execution

This device basically doesn’t integrate any Flash ROM.

NO FLASH!

It needs to execute the program code out of RAM, SDRAM or QSPI Flash.

Support QSPI Flash XIP!

Instead, i.MXRT supports QSPI XIP (eXecute In Place) so that the program code can directly be executed from QSPI flash without having to download it in RAM. This approach makes system BOM cost lower than an embedded flash system.

Of course, there should be a penalty cycle when you execute from the external memory.

For a code that is of importance in processing speed, it is recommend be placed in RAM and executing from there.

In oder to compensate the access penalty, relatively large cache(I-cache 32kB, D-cache 32kB) is integrated to boost the system performance.

IDE (Integrated Development Environment)

I used IAR EWARM for the benchmarking

You may want to download from here. ->IAR Embedded Workbench for ARM

To get Coremark

Now that I will show you how to implement the Coremark.

First, you need to download the coremark. You may want to download from EEMBC.

After registration, you can download it.

Download here→EEMBC-COREMARK

Notification:Coremark source code is not allowed of redistribution so that you can not provide your customers with the project you implement coremark.

Files you need

coremarkの必要なファイル

When you use MCUXPresso SDK, all you need is 5 files in c source(.c), and two files in header files(.h).

  • core_portme.c
  • core_portme.h
  • core_list_join_.c
  • core_main.c
  • core_matrix.c
  • core_util.c
  • coremark.h

These files are needed to modified according to your environment.

Now, let’s see the how-to.

Base project that I used.

I used Hello World sample code which comes with MCUXPresso SDK. I will implement coremark based on this project since everything including startup files and clock configuration is configured in default.

If you want to download SDK, you can do it from here.→MCUXpresso SDK

implementation steps

1.Add gpt timer driver and coremark source files in project

Add GPT driver

In order to measure the cycle count for coremark, I used GPT (General Purpose Timer) timer. You can use other timers if available.

However, there is not GPT timer driver added in project. You need manually to add it in the project by your self from the path of SDK as showed below.

i.MXRT1050 SDK folder/drivers/evkimxrt1050/drivers/

In this folder, there are all the drivers. You can just drag&drop them into drivers group in EWARM project.

gptドライバを追加

Add coremark files

I made a folder(group) named “coremark” in EWARM project and add coremark files by drag&drop in there the same way as GPT driver.

2.Modify coremark.c

Line 89,  MAIN_RETURN_TYPE main(void){ needs to be modified as the function name is the same as main().  You can just rename it to a simple function name that you can easily recognize as coremark processs.

I renamed it to main_coremark(void).


#if MAIN_HAS_NOARGC
//MAIN_RETURN_TYPE main(void) {
MAIN_RETURN_TYPE main_coremark(void) {
int argc=0; char *argv[1];
#else
MAIN_RETURN_TYPE main(int argc, char *argv[]) {
#endif

3.Modify hello_world.c

Added #include “coremark.h”

Added #include “fsl_gpt.h”

I modified main() function as below. GPT timer is initialized and the clock(IGP clock) is divided by 2.
At the top of hello world.c, GPT timer needs to be defined.


#define GPT_IRQ_ID GPT1_IRQn
#define EXAMPLE_GPT GPT1
#define EXAMPLE_GPT_IRQHandler GPT1_IRQHandler
   
/* Select IPG Clock as PERCLK_CLK clock source */
#define EXAMPLE_GPT_CLOCK_SOURCE_SELECT (0U)
/* Clock divider for PERCLK_CLK clock source */
#define EXAMPLE_GPT_CLOCK_DIVIDER_SELECT (0U)
/* Get source clock for GPT driver (GPT prescaler = 0) */
#define EXAMPLE_GPT_CLK_FREQ (CLOCK_GetFreq(kCLOCK_IpgClk) / (EXAMPLE_GPT_CLOCK_DIVIDER_SELECT + 1U)) 


int main(void){

    gtp_config_t gptConfig; 

    /* Init board hardware */

    BOARD_InitPins();
    BOARD_BootClockRUN();
    BOARD_InitDebugConsole();

    /*Clock setting for GPT*/
    CLOCK_SetMux(kCLOCK_PerclkMux, EXAMPLE_GPT_CLOCK_SOURCE_SELECT);
    CLOCK_SetMux(kCLOCK_PerclkDiv, EXAMPLE_GPT_CLOCK_DIVIDEER_SELECT);

    /*GPT timer is setup for measurement of coremark */
    GPT_GetDefaultConfig(&gptConfig);

    /* Initialize GPT module */
    GPT_Init(EXAMPLE_GPT, &gptConfig);

    /*Divide GPT clock source frequency by 2 inside GPT module */
    GPT_SetClockDivider(EXAMPLE_GPT, 2);

    /* Start Timer */
    PRINTF("¥r¥nStarting GPT Timer...");

    GPT_StartTimer(EXAMPLE_GPT);

    /* Coremark start */
    main_coremark();

    while(1);

}

4.Modify core_portme.h

Add #include “stdlib.h”

Comment out #define NULL ((void *)0)

In this .h file, you need to configure the number of loops to perform coremark and the clock frequency of the timer you use and the compiler information.

I configured as below.


#define ITERATIONS   30000  //Number of iteration for coremark, for IMXRT1050 It seems be good around 30000. 
#define CLOCKS_PER_SEC  150000000 //Timer clock frequency 
#define COMPILER_VERSION "IAR EWARM v8.20.2" //Compiler information
#define COMPILER_FLAGS "SPEED" //Compiler flag, I configured it to SPEED in IAR EWARM compiler setting.
#define MEM_LOCATION "RAM"// Place of execution

Data tyep should be dependent on your environment, but I configure it as I use MCUXpresso SDK.

HAS_TIME_H 0
USE_CLOCK 0
HAS_STDIO 1
HAS_PRINTF 1


/************************/
/* Data types and settings */
/************************/
/* Configuration : HAS_FLOAT 
	Define to 1 if the platform supports floating point.
*/
#ifndef HAS_FLOAT 
#define HAS_FLOAT 1
#endif
/* Configuration : HAS_TIME_H
	Define to 1 if platform has the time.h header file,
	and implementation of functions thereof.
*/
#ifndef HAS_TIME_H
//#define HAS_TIME_H 1
#define HAS_TIME_H 0
#endif
/* Configuration : USE_CLOCK
	Define to 1 if platform has the time.h header file,
	and implementation of functions thereof.
*/
#ifndef USE_CLOCK
//#define USE_CLOCK 1
#define USE_CLOCK 0
#endif
/* Configuration : HAS_STDIO
	Define to 1 if the platform has stdio.h.
*/
#ifndef HAS_STDIO
//#define HAS_STDIO 0
#define HAS_STDIO 1
#endif
/* Configuration : HAS_PRINTF
	Define to 1 if the platform has stdio.h and implements the printf function.
*/
#ifndef HAS_PRINTF
//#define HAS_PRINTF 0
#define HAS_PRINTF 1
#endif

5. Modify core_portme.c

Comment out the error part of Barebones_clock()


CODETIMETYPE barebones_clock(){
    //#error "You must implement a method to measure time in barebones_clock"
    return GPT_GetCurrentTimeCount(GPT1);
}

Like above, comment out the part of error in portable_init()内


void portable_init(){
    //#error "Call board initialization routines in portable init (if needed), in particular"
    if (sizeof(ee_ptr_int) != sizeof(ee_u8 *)){
:
:

6.Include path of option setting in EWARM

Lastly, you need to set the include path  in the option setting of EWARM. Sorry, below picture is Japanese window since my PC is Japanese environment.

EWARMインクルードパス設定

Measurement result

The score gives you 2,943! that is way more than I expected!!

i.MXRT1050 is 600MHz, so Coremark/MHz is 5.0!  

I can tell you how much this number is amazing. Cortex-M4F core of Kinetis K60 (100MHz) shows 270 points in coremark and 2.7 in coremark/MHz.

It is about 10 times faster than Kinetis MCU, and it is double in coremark/MHz!

 

Summary

I showed you how-to of coremark implementation and the measurement result. You can easily implement the coremark for other devices.

Basically Coremark can be implemented for any other devices of MCU.

You can use it as a reference.