pyOCD: Performance regression with CMSIS-DAP v2 between release 0.32.3 and 0.33.0

I’m experiencing extremely poor performance with my CMSIS-DAP v2 probe, which is an NXP MCU-Link running a current build of DAPLink. I’ve tested both the latest release and Develop branches and they both have the same issue.

If I force pyOCD to use CMSIS-DAP v1 by adding cmsis_dap.prefer_v1: True to pyocd.yaml, the performance issue is resolved.

Reverting to old versions of pyOCD, I noticed the performance regression occurred between release 0.32.3 and 0.33.0. It looks like pyOCD has had CMSIS-DAP v2 support going back to 0.17.0, so I assumed the 0.32.3 build is talking to my probe over the bulk interface — but I’m not seeing any performance improvement in 0.32.3 (which is presumably using the v2 protocol) versus a newer build handicapped with prefer_v1 set, so I’m not so sure.

To test, I’m starting with an erased ATSAMG55J19 chip and a 512KB test.bin file and I run

C:\> pyocd load -t atsamg55j19 -f20m test.bin

The performance drops from 45-50 KB/s to less than 3 KB/s.

I investigated a bit with my logic analyzer and found that with pyOCD version 0.32.3, SWD packets are tightly packed together with only a few hundred microseconds between each transfer. It takes 14 ms to perform these 37 transfers: PyOCD-0 32 3

But with version 0.33.0 or newer, the same number of transactions takes several seconds. Each transaction has a 15ms delay between it. It takes more than 400 ms to perform 27 transfers: PyOCD-0 33 0

I haven’t looked at pyOCD’s source code, but I’ve had similar issues in USB-based projects for a variety of reasons, but the main one is if those I/O calls aren’t executed with overlapping transfers.

While I have many other CMSIS-DAP-compatible probes laying around, none of them support CMSIS-DAP v2, so I can’t eliminate the DAPLink firmware running on the MCU-Link as a possible culprit, either.

In case you want to examine the Saleae captures directly, here they are: PyOCD 0.32.3 vs 0.33.0 Saleae Captures.zip

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 21 (9 by maintainers)

Most upvoted comments

@Guozhanxin Thanks very much for testing! I’ll clean the patch up a bit before I commit it. Similar changes also need to be made to the other CMSIS-DAP USB drivers.

It’s unfortunate the speed is a little lower, but some form of synchronisation is necessary to ensure thread safety. I’ll think about other ways to handle this safely, with better performance. But this is probably what the patch will look like for the next release.

Thanks again for your help! 🙏🏽

I will test it later.

@Guozhanxin @jaydcarlson

If you have a chance, could you please try out the attached patch against commit 376d3f4? Thanks!

pyOCD-bcc744a-cmsis-dap_v2_queue.patch

@flit I have tested by this. Good news, the download speed of daplink-v2 under this version has increased to 39.04 kB/s

PS E:\workspace\HMI-Board\pyocd-test> pyocd flash --target=R7FA6M3AH --erase=auto --frequency=1000000 E:\workspace\HMI-Board\ra6m3_project\project\rtthread.hex
0000929 I Loading E:\workspace\HMI-Board\ra6m3_project\project\rtthread.hex [load_cmd]
[==================================================] 100%
0017239 I Erased 0 bytes (0 sectors), programmed 0 bytes (0 pages), skipped 638864 bytes (4992 pages) at 39.04 kB/s [loader]

And, the download speed of daplink-v1 under this version has not changed.

PS E:\workspace\HMI-Board\pyocd-test> pyocd flash --target=R7FA6M3AH --erase=auto --frequency=1000000 E:\workspace\HMI-Board\ra6m3_project\project\rtthread.hex
0000958 I Loading E:\workspace\HMI-Board\ra6m3_project\project\rtthread.hex [load_cmd]
[==================================================] 100%
0038204 I Erased 0 bytes (0 sectors), programmed 0 bytes (0 pages), skipped 638864 bytes (4992 pages) at 16.90 kB/s [loader]

And, when I change pyocd frequency to 10000000, the download speed of daplink-v2 under this version has increased to 80.02 kB/s

PS E:\workspace\HMI-Board\pyocd-test> pyocd flash --target=R7FA6M3AH --erase=auto --frequency=10000000 E:\workspace\HMI-Board\ra6m3_project\project\rtthread.hex
0000909 I Loading E:\workspace\HMI-Board\ra6m3_project\project\rtthread.hex [load_cmd]
[==================================================] 100%
0009042 I Erased 0 bytes (0 sectors), programmed 0 bytes (0 pages), skipped 638864 bytes (4992 pages) at 80.02 kB/s [loader]

@Guozhanxin Thanks for testing! That pretty much confirms that it’s something related to libusb usage or the CMSIS-DAPv2 pyusb backend. CMSIS-DAPv1 isn’t affected because it uses hidapi for HID class on Windows.

What’s really interesting is that you used libusb-package for v0.32.3. That actually really helps, because it shows that the problem is not caused by an issue with the version of libusb in libusb-package.

Thanks for the report, and apologies for issues this has caused.

It looks like you are using Windows?

Regarding overlapping transfers: the pyusb package used by pyocd (and most other python based tools using USB) doesn’t support asynchronous transfers (required for overlapping transfers) even though the underlying libusb does provide an asynchronous API. This is supported by another Python wrapper around libusb, but would obviously require a pretty big change to pyocd.

Still, pyocd does use an independent receive thread that always keeps a receive request open (unless the request times out due to no IN packets, in which case it’s immediately re-issued). While this isn’t really asynchronous in the same way, it does, or at least is intended to, achieve much the same result.

In addition, pyocd uses a feature of the CMSIS-DAP protocol where there can be multiple outstanding requests, although they are always issued and responses received in requested order (otherwise you’d have a mess). The actual utility of is limited based on the operations being performed; often the result of one is required before the next can be issued. But it does help with things like flash programming. However(!), at one point in the past (a few years ago, can’t remember the version) DAPLink lost its ability to support multiple outstanding CMSIS-DAP requests in an attempt to achieve much greater stability by serialising everything through its main thread (that worked… it’s very stable now, but not as fast as it could be).