zephyr: gptp does not work well on NXP rt series platform
Description of bug
Problem with the gptp_event_capture() function in the gPTP demo in ZephyrOS on the RT1020EVK: time.second member of net_ptp_time struct is updated correctly, but time.nanosecond is NOT. The same function in the same demo on the FRDM-K64F works correctly.
Background
We are trying to use gPTP (IEEE 802.1AS) on the i.MXRT platform to synchronize timestamps across devices in a sensor network. All the hardware documentation for the i.MXRT and App Notes such as AN12149 (https://www.nxp.com/docs/en/nxp/application-notes/AN12149.pdf) indicate that this should be possible. We are interested in using ZephyrOS because we feel it should work well with the way we develop embedded systems, which are at the moment all based on Linux and Yocto. Zephyr seems to have good network support, and gPTP support, at least on some platforms (https://docs.zephyrproject.org/latest/reference/networking/gptp.html). To kick-start this process, and to evaluate whether gPTP/ZephyrOS/i.MXRT is a good technical solution for synchronization, we asked a graduate student at Universitat Politècnica de Catalunya, Santi Prats, to help us.
Santi advanced significantly, first on installing the SDK and tools, then with compiling Zephyr and applications for i.MXRT and Kinetis, also with PTP and gPTP synchronization purely in Linux. Then he encountered a problem which stumped him and us.
To Reproduce
On the FRDM-K64F board (https://docs.zephyrproject.org/latest/boards/arm/frdm_k64f/doc/index.html), he is able to extract timestamps from the gPTP system with nanosecond resolution. But on the RT1020EVK (https://docs.zephyrproject.org/latest/boards/arm/mimxrt1020_evk/doc/index.html), using the same code, with the same libraries and tools, the nanosecond member of the net_ptp_time struct is NOT being filled, and therefore we are only able able to read the PTP Hardware Clock with a resolution of seconds. Reproducing the issue is as simple as adding a few extra LOG_INF() calls to the gptp demo (see below), compiling it for the RT1020-EVK, flashing it, and running it.
Lightly modified demo code
/∗USER BEGIN INCLUDES∗/
#include <net/ptptime.h>
#include <sys/printk.h>
#include <sys/util.h>
/∗USER END INCLUDES∗/
/∗USER BEGIN VARIABLES∗/
static struct net_ptp_time slave_time;
//struct gptp_clk_src_time_invokeparams src_time_invoke_parameters;
bool gm present ;
int status ;
/∗USER END VARIABLES∗/
void main ( v o i d )
{
/∗ USER BEGIN MAIN. C∗/
while (1) {
status=gptp_event_capture(&slave_time, &gm_present) ;
LOG_INF( ” ” ) ;
LOG_INF( ” ” ) ;
LOG_INF( ” Standard info plot:” ) ;
LOG_INF( ”gPTP event capture is %i ” , status ) ; // 0 es NO ERROR
LOG_INF( ”gPTP time second %u” , slave_time.second ) ;
LOG_INF( ” ” ) ;
LOG_INF( ” Plot slave time SECONDS: ” ) ;
LOG_INF( ”gPTP slave time second(u) %u” , slave_time.second ) ;
LOG_INF( ”gPTP slave time second (X) 0x %X” , slave_time.second ) ;
LOG_INF( ” ” ) ;
LOG_INF( ”Plot slave time NANOSECONDS: ” ) ;
LOG_INF( ”gPTP slave time nanosecond (u) %u” , slave_time.nanosecond ) ;
LOG_INF( ”gPTP slave time nanosecond (X) 0x %X” , slave_time.nanosecond ) ;
LOG_INF( ” ” ) ;
LOG_INF( ” slave_time.second address : 0x %X” , &( slave_time.second ) ) ;
LOG_INF( ” slave_time.nanosecond address : 0x %X” , &( slave_time.nanosecond ) ) ;
kmsleep(1000) ; // sleep time in ms
}
/∗ USER END MAIN. C∗/
}
Build
$ west build -b mimxrt1020 evk samples/net/gptp/
$ west flash
Run
[00:04:59.182,000] <inf> net_gptp_sample:
[00:04:59.182,000] <inf> net_gptp_sample: Standard info plot:
[00:04:59.182,000] <inf> net_gptp_sample: gPTP event capture is 0
[00:04:59.182,000] <inf> net_gptp_sample: gPTP time second 1614051136
[00:04:59.182,000] <inf> net_gptp_sample:
[00:04:59.182,000] <inf> net_gptp_sample: Plot slave time SECONDS:
[00:04:59.182,000] <inf> net_gptp_sample: gPTP slave time second (u) 1614051136
[00:04:59.182,000] <inf> net_gptp_sample: gPTP slave time second (X) 0x60347740
[00:04:59.182,000] <inf> net_gptp_sample:
[00:04:59.182,000] <inf> net_gptp_sample: Plot slave time NANOSECONDS:
[00:04:59.182,000] <inf> net_gptp_sample: gPTP slave time nanosecond (u) 0
[00:04:59.182,000] <inf> net_gptp_sample: gPTP slave time nanosecond (X) 0x0
[00:04:59.182,000] <inf> net_gptp_sample:
[00:04:59.182,000] <inf> net_gptp_sample: slave_time.second address: 0x80001DC0
[00:04:59.182,000] <inf> net_gptp_sample: slave_time.second address: 0x80001DC8
uart:~$
Expected behavior
The expected behavior is what happens running the same code on the FRDM-k64f:
[00:05:27.167,000] <inf> net_gptp_sample:
[00:05:27.167,000] <inf> net_gptp_sample: Standard info plot:
[00:05:27.167,000] <inf> net_gptp_sample: gPTP event capture is 0
[00:05:27.167,000] <inf> net_gptp_sample: gPTP time second 1614072200
[00:05:27.167,000] <inf> net_gptp_sample:
[00:05:27.167,000] <inf> net_gptp_sample: Plot slave time SECONDS:
[00:05:27.167,000] <inf> net_gptp_sample: gPTP slave time second (u) 1614072200
[00:05:27.167,000] <inf> net_gptp_sample: gPTP slave time second (X) 0x6034C988
[00:05:27.167,000] <inf> net_gptp_sample:
[00:05:27.167,000] <inf> net_gptp_sample: Plot slave time NANOSECONDS:
[00:05:27.167,000] <inf> net_gptp_sample: gPTP slave time nanosecond (u) 906384492
[00:05:27.167,000] <inf> net_gptp_sample: gPTP slave time nanosecond (X) 0x3605646C
[00:05:27.167,000] <inf> net_gptp_sample:
[00:05:27.167,000] <inf> net_gptp_sample: slave_time.second address: 0x20001DC0
[00:05:27.167,000] <inf> net_gptp_sample: slave_time.second address: 0x20001DC8
uart:~$
Note that on the FRDM-k64f, the nanosecond member IS being updated.
Impact We are unable to proceed with the development of the time synchronization element of our new sensor solution because of this issue. Santi is attempting to make equivalent functionality work in FreeRTOS to continue with his academic pursuits, but this is not a great solution for us, because other team members are developing other components of the sensor firmware in Zephyr. We the just getting started with Zephyr; we like what we see so far, but timestamp synchronization with gPTP is at the very heart of what we are trying to do with our sensors–if we can not make it work, we will need to change our strategy. We reached out to NXP through our distributor EBV, and they have confirmed that the gPTP stack and demo should work in Zephyr on the RT1020-EVK. They encouraged us to report the issue here.
GDB / console output
Using the gdb debugger, Santi isolated the differences in the progress of the call to gptp_event_capture to a single line of code that differs between the FRDM-K64F and the MIMXRT1020-EVK. The relevant part of his (attached) document describing the problem follows:
GDB debugging
To investigate the differences between what is happening on the FRDM-K64F and the MIMXRT1020-EVK, the GDB debugger was used. The following was executed from the Zephyr directory:
$ west debug
We enter into “debug mode”
(gdb) layout src
(gdb) advance gptp event capture
A position is set in the moment just before the execution of
status=gptp event capture(&slave time, &gm present)
At this instant,
(gdb) print slave time.second
(gdb) print slave time.nanoecond
Both return 0. We now step through to see execution of gptp_event_capture()
(gdb) step
Before the execution finishes, and analyzing the output of (gdb) step, it is seen that execution passes through the following files and in this order:
/include/arch/arm/aarch32/asm inline gcc.h line 56
/subsys/net/l2/ethernet/gptp/gptp user api.c line
/subsys/net/l2/ethernet/gptp/gptp user api.c line 62
#### [ The 1020EVK does NOT pass through line 62, unlike the FRDM-K64F ]
/subsys/net/l2/ethernet/gptp/gptp user api.c line 64
/subsys/net/l2/ethernet/gptp/gptp user api.c line 66
/subsys/net/l2/ethernet/ethernet.c line 1059
/include/net/net if.h line 589
/subsys/net/l2/ethernet/ethernet.c line 1060
/subsys/net/l2/ethernet/ethernet.c line 1062
/include/net/net if.h line 555
/subsys/net/l2/ethernet/ethernet.c line 1070
/drivers/ethernet/eth mcux.c line 1083
/subsys/net/l2/ethernet/ethernet.c line 1074
/include/net/net if.h line 1078
/include/net/net if.h line 589
/modules/hal/cmsis/CMSIS/Core/Include/cmsis gcc.h line 453
/modules/hal/nxp/mcux/drivers/imx/fsl common.h line 566
/modules/hal/cmsis/CMSIS/Core/Include/cmsis gcc.h l´ınea 209
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2857
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2826
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2828
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2833 - 2835
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2838
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2860 - 2862
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2866
/modules/hal/cmsis/CMSIS/Core/Include/cmsis gcc.h line 481
/drivers/ethernet/eth mcux.c line 1439
/drivers/ethernet/eth mcux.c line 1440
/drivers/ethernet/eth mcux.c line 1441
/subsys/net/l2/ethernet/gptp/gptp user api.c line 69
/include/arch/arm/aarch32/asm inline gcc.h line 95
/subsys/net/l2/ethernet/gptp/gptp user api.c line 70
/samples/net/gptp/src/main.c
Finally,
(gdb) print slave time.second
(gdb) print slave time.nanoecond
Now, in the display of the time.seconds member, a correct value is seen, while the time.nanoseconds member still reads 0:

In conclusion, we are seeing the desired behavior on the FRDM-K64F; we are able to read timestamps down to the nanosecond. The same application code, making the same api calls on the RT1020 is not returning nanoseconds, however, and it all seems to come down to line 62 in gptp_user_api.c
Environment:
- OS: Linux
- Toolchain: Zephyr SDK 0.12.2, West v0.9.0, cmake 3.16.3, pip 20.0.2, GDB
- Commit SHA: 2857c2e9845a328e2bee2be9005071e46db8b8bb
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 239 (105 by maintainers)
Commits related to this issue
- Add pps sync mechanisms and optimize a few lines of code This is not good enough, pps keeps drifting — committed to ainguraXmarquiegui/zephyr by ainguraXmarquiegui 3 years ago
- gptp: eth_mcux: update driver for gptp 1. do noe use gptp lower api to change the HW clock 2. using sofware to maintain the time drift fixing: #33747 Signed-off-by: Hake Huang <hake.huang@oss.nxp.c... — committed to nxp-zephyr/zephyr by hakehuang a year ago
- gptp: eth_mcux: update driver for gptp 1. do noe use gptp lower api to change the HW clock 2. using sofware to maintain the time drift fixing: #33747 Signed-off-by: Hake Huang <hake.huang@oss.nxp.c... — committed to nxp-zephyr/zephyr by hakehuang a year ago
- gptp: eth_mcux: update driver for gptp 1. do noe use gptp lower api to change the HW clock 2. using sofware to maintain the time drift 3. use a coinfig to control the HW timer set enable as soem c... — committed to nxp-zephyr/zephyr by hakehuang a year ago
- gptp: eth_mcux: update driver for gptp 1. do noe use gptp lower api to change the HW clock 2. using sofware to maintain the time drift 3. use a coinfig to control the HW timer set enable as soem c... — committed to nxp-zephyr/zephyr by hakehuang a year ago
- gptp: eth_mcux: update driver for gptp 1. do noe use gptp lower api to change the HW clock 2. using sofware to maintain the time drift 3. use a coinfig to control the HW timer set enable as soem c... — committed to nxp-zephyr/zephyr by hakehuang a year ago
This looks like some kind of stack issue
Good news @hakehuang ! I have only been able to do a couple of quick tests, but it seems that the errors I was seeing in my previous tests are gone with your latest code base.
I have tested it on the imxrt1020evk and imxrt1050evk for a few minutes. When I am less busy I will try to do longer duration tests, and maybe try to verify that time synchronization is working. I just wanted to do a quick test run to let you know if there was something obvious to report or not.
@hakehuang , here’s the pull request:
https://github.com/zephyrproject-rtos/zephyr/pull/42216
@hakehuang Thank you very much for your work. I will gladly test it, but I need time.
I won’t have access to the resources I need to conduct the test until December 13. I will let you know the result once I do the test.
I need more time to test. I’ll get back to you when I have test results.
@ainguraXmarquiegui , sorry, some more updates from our internal experts are under reivew. I will update the code once review approved.
I’m sorry @hakehuang . I just saw your latest post edit. When you edit a post I don’t get notifications.
I’d like to request that next time that you add relevant information, please post it as a new entry, and do not edit posts. This will improve my response time.
I will have a look at your newest modifications now.
Jitter is still in the range of 30uS to 70uS on short test runs. No noticeable difference is visible for jitter when running off USB power or standalone PSU.
Small correction on my previous post. The system will be running all weekend on your code, plus this small modification. I hope it’s ok:
@hakehuang , I have done all my tests with pps code on because I was trying to verify long term accuracy of the syncronization. Seeing that every time it fails I can see effects of the failure both on the oscilloscope, and on the system console, I can repeat the test without the PPS code.
I will try to leave the system running all weekend with your code as is, without the PPS added modifications.
@hakehuang , I posted something, but I just deleted my last post because I realized that I had made a couple of fundamental mistakes in my observations.
I am still doing some tests. Good news is with your latest suggestions interrupts seem to work again and we can use Output Compare and Input Capture features of the imxrt. Bad news is I have not seen improvements in jitter so far.
These are still very early observations. I’m still performing tests. I will get back to you when I have completed more tests.
Hi @dleach02, @cfriedt, @hakehuang, and friends,
We have been trying to get PPS generation working in Zephyr, as requested, and it is taking us longer than we would have liked. Also, @ainguraXmarquiegui is out of the office this week, and I will be out of the office next week. I anticipate that @ainguraXmarquiegui should be able to see if #35328 has a beneficial effect when he returns next week.
@dleach02 , I’m sorry. I have been away for a few days. I’m back now. II want to run a few more tests and I will get back to you before the week ends.
Hello @dleach02, hi everyone,
I can add a little bit more detail to what @ainguraXmarquiegui just posted–I think the NIC capabilities and firmware are the definitely the most important details, but in case you need some more information…
@ainguraXmarquiegui and I both have the same model of laptop for development, an HP ZBOOK 15 with a Core i9-9880H. We are both using Ubuntu 20.04. The wired Ethernet interface over which we are running gPTP is an Intel I219-LM (seen above as enp0s31f6). We are not using that interface to connect to the Internet or other wired networks and so far we are not using any Ethernet switches between gPTP endpoints for these tests–we connect the FRDM or the i.MXRT EVK directly to the RJ45 jack of our laptops with a short Cat6 cable.
In terms of software, we are using Avnu gptp version v1.0.0-30-g0baef8a, cloned from the Avnu repo: https://github.com/Avnu/gptp
and we launch the daemon like this:
sudo ./gptp enp0s31f6 -F gptp_cfg.ini@ainguraXmarquiegui and @hakehuang have discussed changing things in their gptp_cfg.ini files, but this is the one I am using:
gptp_cfg.ini.txt