usb-serial-for-android: Error message "USB get_status request failed" after a few hours working fine

Hi, I’m developing an application that connects to a device through USB and then sends a command (the same command) over and over again in intervals of 30 seconds.

Everything seems to work as expected for some hours, sometimes almost 24 hours working without any problem, but then we suddenly start to get this error USB get_status request failed. I searched for this error message in your repo and I found some people that encountered the same error message, but two cases in particular seem somewhat similar to mine:

In this first issue the OP reported that everything was working initially but later it stopped after 2 hours; on this second issue the OP mentions that everything stopped working after 5 hours.

I find intriguing that in all theses cases, including mine, everything is working for a while, but then out of nowhere it stops working after some hours. I know this is a long shot, but do you have any idea what could be causing these problems?

Thanks!

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 2
  • Comments: 43

Most upvoted comments

Since API lvl 26, requestWait has that option .

requestWait with timeout has other issues and as I anyway need the old implementation for older devices I did not switch or investigate if it got better with more recent Android versions: https://github.com/mik3y/usb-serial-for-android/blob/34e6d989fdf8d542981c0d1ad9c93027c9be7063/usbSerialForAndroid/src/main/java/com/hoho/android/usbserial/driver/CommonUsbSerialPort.java#L186

Were these on emulators or on actual hardware?

All real devices.

@jhlink Yes, you’re right, the project that I’m working on is proprietary. However, the main class that makes the connection with the USB device and also read/writes data, comes from an open source Android library that I created to help me during the development of my projects.

You can find this class called UsbHelper here: https://github.com/vegidio/android-sak/blob/main/util/src/main/kotlin/io/vinicius/sak/util/UsbHelper.kt

You will notice that I have a variable port: UsbSerialPort in this class, and this variable reads/writes data in the functions read() and write(); I added the annotation @Synchronize to both functions.

But just to make sure that I was covering all possible edge cases, I also added @Synchronize to the functions connect() and disconnect() because even though they don’t read or write any data in the USB device, they also access the port: UsbSerialPort variable.

Sorry, but this is the only part of my code that I can share. However, if you could share your example I will compare with the rest of my project and see if there’s something else I’m missing.

I can definitely leave my device on for much longer. However, I hate to write this, but less than 2 hours after my last update I lost connection with the USB device 😢

There are only two places in my code where I call UsbSerialPort read and write functions and I added synchronize to both places, but I will go through the code again to see if it benefits from adding synchronize in other areas. In any case, thank you very much for the support you have provided so far.

@jhlink Ok, I did the change that you suggested and used synchronize in all places where UsbSerialPort reads or writes something.

Before this change, my app could run for around 16 hours before losing connection with the USB device, but there was a case where it ran for almost a full day (23 hours to be precise) before it lost connection. However, this happened only once; most of the time the connection is lost around the 16 hour mark.

Now, with the change that you suggested, my app is already running for 24 hours without losing connection, which is 1 hour more than the max amount of time it ever remained connected 🎉

I’m feeling carefully optimistic, but I think you found the source of the problem. I will let my app run for 24 hours more and I will report again tomorrow. 🙂

@jhlink Thank you very much for looking into this. I will test on Monday and come back with a feedback soon!

@kai-morich Status update. I spent the day reproducing the failure with a USB protocol analyzer. I seem to have found a fix for the issue, but @vegidio it would be great if you could test this to prove that it works. (Hope you’re family is doing okay. )

Overview

Reproduced the failure using the example project in the repository. It was modified to force the failure through heavy read/write throughput operations. An IO exception occurred within seconds, which indicated packet corruption due to concurrency issues. Using synchronized on the USBSerialPort instance within the serial write thread seems to have resolved the issue. Over 50GB of USB traffic data was collected using the USB protocol analyzer given the following:

  • The host was writing a 100+ byte payload every ms and reading any data arriving in a separate thread.
  • The device, which communicated to the host via a USB Hub, was reading 50 bytes and writing a 100+ bytes payload on its serial port at 10 ms intervals.

After applying the modifier, this setup was run for 45+ minutes with no other connection loss or failures observed on the application, USB traffic, or on the device.

@vegidio , could you apply the synchronized flag on your UsbSerialPort instance in the thread that’s writing to the serial port and observe if the failure appears for you?

Methodology

Attempt 1 - get_status request failure with Legacy code

Initially, I used the USB Protocol Analyzer (USBPA for short) on the legacy software I’m maintaining. I was able to reproduce the USB get_status_request issue, but the traffic indicated that during a bulk transfer of a 300 byte payload, a baud rate configuration control transfer was made; it contained the SET_LINE_CODING code found in setParameters. 2.5 seconds later, which is the read timeout I’ve specified, a series of SETUP request failures occurred before succeeding about 2ms later. This is the GET_STATUS request that’s being sent in the test_connection function. The USBPA identified these packet failures as an invalid PID sequence because of the following:

  1. The device failed to send the NACK response after the SETUP status transaction instead of the SETUP DATA0 transaction.
  2. Missing the CSPLIT transaction since the device was communicating through a USB Hub.

Despite the invalid PID error above, the root cause for the unexpected SET_LINE_CODING control transfer is likely due to a bug or logical failure in the application, not in the library or the hardware layer.

Summary: Do not call setParameters during ongoing traffic, which should be a self-evident principle in any communication protocol. With respect to the library, it is odd why setParameters can only be set after the UsbSerialPort is opened. I’m guessing the reason is to ensure maximum device compatibility using a predefined, default set of values to open the device before changing it later. It would be nice if the “setParameters” can be called before “open,” such that the desired baud rate and port configuration is used when the port is opened or if the configuration can be part of the open function header.

Forcing the developer to change port configs (if the default 9600 baud doesn’t apply) after port opening seems to allow for a category of software bugs.

Attempt 2 - force failure with example code

In order to prove there is a software bug in the library, I’ve modified the example project in the library. I’ve removed everything UI related to test the core issue. A dedicated HandlerThread was created for writing data to the UsbSerialPort and handling the data received from OnNewData.

Without incorporating the synchronized modifier on the UsbSerialPort instance, I experienced a java.io.IOException: Error writing 19 bytes at offset 32 of total 51, which indicated that packet corruption was occurring due to concurrency issues.

After adding the modifier and locking inter-thread write access to the UsbSerialPort, I didn’t experience this issue even after aggressively high, bidirectional throughput.

Summary: UsbSerialPort is not thread-safe. If read/write operations are placed in a separate thread, a mutex lock will be required on the port itself (not their buffers) to avoid undefined behavior caused by concurrency issues. Reviewing the example code for SimpleUsbTerminal, I’ve observed a similar usage of synchronized within Serial Reads/Writes as data is being received or sent across threads.

An argument can be made that a mutex lock may not be necessary if there is only one thread that handles serial writes. Given a bona fide RTOS, I’d agree, circumstantially. However, Android is not an RTOS, and the OS can kill/restart any thread at any time unless bound to a service or tied to the UI. Given that I, as the developer, cannot predict or guarantee thread termination/restarts, a mutex lock is required.

Conclusion

Preliminary tests indicate that the failure is due to concurrency issues in accessing UsbSerialPort, which is apparently not thread-safe. The issue isn’t necessarily a software bug in Android’s API, although greater detail in error propagation in the USB API would alleviate the need to buy a $3k+ tool to root cause USB failures.

If any developer is splitting read/write operations into dedicated threads, they should take care to incorporate mutex locks. Otherwise, if the developer is handling all read/write operations on the main thread, the application won’t see failures due to concurrency issues, though there will likely be failures due to high load on the main thread.

Suggestions

  • Update the Readme to warn developers about concurrency issues, and use synchronized to avoid undefined failures.
  • Modify open such that the port configuration can be applied when a port is opened. Eliminates the possibility of calling setParameters during ongoing USB traffic.

Next Steps

I’ll incorporate the synchronized change in legacy code, and after a ground-up redesign, I will test the overall solution to see if the get_status request will occur.

I’ll wait to see if proposed change works for @vegidio .

Folks, I want to find a solution for this problem like everyone else in this thread, however let’s not act as if this is a paid service and the people working on this own us anything.

I’m sure that whenever a solution is found an update will be posted here.

Ok, now that I’m back investigating this I found something yesterday that might give some hints of what’s happening, at least in my case.

First I would like to share a piece of my code:

  1. First I try to find the driver associated with the USB device that I’m using:
val driver: UsbSerialDriver by lazy {
    val allDrivers = UsbSerialProber.getDefaultProber().findAllDrivers(usbManager)
    if (allDrivers.isEmpty()) error("No serial USB devices!")
    // My USB device is from Ingenico
    allDrivers.first { it.device.manufacturerName.equals("ingenico", ignoreCase = true) }
}

and then I get the UsbDevice simply doing this:

val device get() = driver.device

The code above works great when I start using my app, but when it stops responding (I’ve put some code to detect that), I try to reconnect. However, this time the same code above to find the driver associated with my USB device doesn’t find any result for manufacturerName = Ingenico.

The list allDrivers in the code above is not empty (I have 2 USB devices connected to my Android device), but the first time I run this code, the list has 2 devices (Ingenico and another one). However, when Ingenico stops responding, the same code returns only 1 device (not Ingenico).

So, is it safe to assume that whatever is causing the loss of connectivity with the device is not a problem in the source code? I mean, when I lose connectivity, I cannot even try to reconnect to the device because it no longer appears in the list of available drivers, but the other USB device is still there. It gives me the impression that my Android phone and my code are working as expected; it’s the USB device that has entered an unrecoverable state somehow.

@Mohammedbinnazar , I should be back on this later this week. Hopefully, today once I dial in the SW changes for unrelated dev.