usb-serial-for-android: Error message "USB get_status request failed" after a few hours working fine
Hi, I’m developing an application that connects to a device through USB and then sends a command (the same command) over and over again in intervals of 30 seconds.
Everything seems to work as expected for some hours, sometimes almost 24 hours working without any problem, but then we suddenly start to get this error USB get_status request failed. I searched for this error message in your repo and I found some people that encountered the same error message, but two cases in particular seem somewhat similar to mine:
In this first issue the OP reported that everything was working initially but later it stopped after 2 hours; on this second issue the OP mentions that everything stopped working after 5 hours.
I find intriguing that in all theses cases, including mine, everything is working for a while, but then out of nowhere it stops working after some hours. I know this is a long shot, but do you have any idea what could be causing these problems?
Thanks!
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 2
- Comments: 43
requestWaitwith timeout has other issues and as I anyway need the old implementation for older devices I did not switch or investigate if it got better with more recent Android versions: https://github.com/mik3y/usb-serial-for-android/blob/34e6d989fdf8d542981c0d1ad9c93027c9be7063/usbSerialForAndroid/src/main/java/com/hoho/android/usbserial/driver/CommonUsbSerialPort.java#L186All real devices.
@jhlink Yes, you’re right, the project that I’m working on is proprietary. However, the main class that makes the connection with the USB device and also read/writes data, comes from an open source Android library that I created to help me during the development of my projects.
You can find this class called
UsbHelperhere: https://github.com/vegidio/android-sak/blob/main/util/src/main/kotlin/io/vinicius/sak/util/UsbHelper.ktYou will notice that I have a variable
port: UsbSerialPortin this class, and this variable reads/writes data in the functionsread()andwrite(); I added the annotation@Synchronizeto both functions.But just to make sure that I was covering all possible edge cases, I also added
@Synchronizeto the functionsconnect()anddisconnect()because even though they don’t read or write any data in the USB device, they also access theport: UsbSerialPortvariable.Sorry, but this is the only part of my code that I can share. However, if you could share your example I will compare with the rest of my project and see if there’s something else I’m missing.
I can definitely leave my device on for much longer. However, I hate to write this, but less than 2 hours after my last update I lost connection with the USB device 😢
There are only two places in my code where I call
UsbSerialPortread and write functions and I added synchronize to both places, but I will go through the code again to see if it benefits from adding synchronize in other areas. In any case, thank you very much for the support you have provided so far.@jhlink Ok, I did the change that you suggested and used synchronize in all places where
UsbSerialPortreads or writes something.Before this change, my app could run for around 16 hours before losing connection with the USB device, but there was a case where it ran for almost a full day (23 hours to be precise) before it lost connection. However, this happened only once; most of the time the connection is lost around the 16 hour mark.
Now, with the change that you suggested, my app is already running for 24 hours without losing connection, which is 1 hour more than the max amount of time it ever remained connected 🎉
I’m feeling carefully optimistic, but I think you found the source of the problem. I will let my app run for 24 hours more and I will report again tomorrow. 🙂
@jhlink Thank you very much for looking into this. I will test on Monday and come back with a feedback soon!
@kai-morich Status update. I spent the day reproducing the failure with a USB protocol analyzer. I seem to have found a fix for the issue, but @vegidio it would be great if you could test this to prove that it works. (Hope you’re family is doing okay. )
Overview
Reproduced the failure using the example project in the repository. It was modified to force the failure through heavy read/write throughput operations. An IO exception occurred within seconds, which indicated packet corruption due to concurrency issues. Using
synchronizedon the USBSerialPort instance within the serial write thread seems to have resolved the issue. Over 50GB of USB traffic data was collected using the USB protocol analyzer given the following:After applying the modifier, this setup was run for 45+ minutes with no other connection loss or failures observed on the application, USB traffic, or on the device.
@vegidio , could you apply the synchronized flag on your
UsbSerialPortinstance in the thread that’s writing to the serial port and observe if the failure appears for you?Methodology
Attempt 1 - get_status request failure with Legacy code
Initially, I used the USB Protocol Analyzer (USBPA for short) on the legacy software I’m maintaining. I was able to reproduce the USB
get_status_requestissue, but the traffic indicated that during a bulk transfer of a 300 byte payload, a baud rate configuration control transfer was made; it contained theSET_LINE_CODINGcode found insetParameters. 2.5 seconds later, which is the read timeout I’ve specified, a series of SETUP request failures occurred before succeeding about 2ms later. This is the GET_STATUS request that’s being sent in thetest_connectionfunction. The USBPA identified these packet failures as an invalid PID sequence because of the following:Despite the invalid PID error above, the root cause for the unexpected SET_LINE_CODING control transfer is likely due to a bug or logical failure in the application, not in the library or the hardware layer.
Summary: Do not call
setParametersduring ongoing traffic, which should be a self-evident principle in any communication protocol. With respect to the library, it is odd whysetParameterscan only be set after the UsbSerialPort is opened. I’m guessing the reason is to ensure maximum device compatibility using a predefined, default set of values to open the device before changing it later. It would be nice if the “setParameters” can be called before “open,” such that the desired baud rate and port configuration is used when the port is opened or if the configuration can be part of theopenfunction header.Forcing the developer to change port configs (if the default 9600 baud doesn’t apply) after port opening seems to allow for a category of software bugs.
Attempt 2 - force failure with example code
In order to prove there is a software bug in the library, I’ve modified the example project in the library. I’ve removed everything UI related to test the core issue. A dedicated HandlerThread was created for writing data to the
UsbSerialPortand handling the data received fromOnNewData.Without incorporating the
synchronizedmodifier on the UsbSerialPort instance, I experienced ajava.io.IOException: Error writing 19 bytes at offset 32 of total 51, which indicated that packet corruption was occurring due to concurrency issues.After adding the modifier and locking inter-thread write access to the UsbSerialPort, I didn’t experience this issue even after aggressively high, bidirectional throughput.
Summary: UsbSerialPort is not thread-safe. If read/write operations are placed in a separate thread, a mutex lock will be required on the port itself (not their buffers) to avoid undefined behavior caused by concurrency issues. Reviewing the example code for SimpleUsbTerminal, I’ve observed a similar usage of
synchronizedwithin Serial Reads/Writes as data is being received or sent across threads.An argument can be made that a mutex lock may not be necessary if there is only one thread that handles serial writes. Given a bona fide RTOS, I’d agree, circumstantially. However, Android is not an RTOS, and the OS can kill/restart any thread at any time unless bound to a service or tied to the UI. Given that I, as the developer, cannot predict or guarantee thread termination/restarts, a mutex lock is required.
Conclusion
Preliminary tests indicate that the failure is due to concurrency issues in accessing UsbSerialPort, which is apparently not thread-safe. The issue isn’t necessarily a software bug in Android’s API, although greater detail in error propagation in the USB API would alleviate the need to buy a $3k+ tool to root cause USB failures.
If any developer is splitting read/write operations into dedicated threads, they should take care to incorporate mutex locks. Otherwise, if the developer is handling all read/write operations on the main thread, the application won’t see failures due to concurrency issues, though there will likely be failures due to high load on the main thread.
Suggestions
synchronizedto avoid undefined failures.opensuch that the port configuration can be applied when a port is opened. Eliminates the possibility of callingsetParametersduring ongoing USB traffic.Next Steps
I’ll incorporate the synchronized change in legacy code, and after a ground-up redesign, I will test the overall solution to see if the
get_statusrequest will occur.I’ll wait to see if proposed change works for @vegidio .
Folks, I want to find a solution for this problem like everyone else in this thread, however let’s not act as if this is a paid service and the people working on this own us anything.
I’m sure that whenever a solution is found an update will be posted here.
Ok, now that I’m back investigating this I found something yesterday that might give some hints of what’s happening, at least in my case.
First I would like to share a piece of my code:
and then I get the
UsbDevicesimply doing this:The code above works great when I start using my app, but when it stops responding (I’ve put some code to detect that), I try to reconnect. However, this time the same code above to find the driver associated with my USB device doesn’t find any result for
manufacturerName = Ingenico.The list
allDriversin the code above is not empty (I have 2 USB devices connected to my Android device), but the first time I run this code, the list has 2 devices (Ingenico and another one). However, when Ingenico stops responding, the same code returns only 1 device (not Ingenico).So, is it safe to assume that whatever is causing the loss of connectivity with the device is not a problem in the source code? I mean, when I lose connectivity, I cannot even try to reconnect to the device because it no longer appears in the list of available drivers, but the other USB device is still there. It gives me the impression that my Android phone and my code are working as expected; it’s the USB device that has entered an unrecoverable state somehow.
@Mohammedbinnazar , I should be back on this later this week. Hopefully, today once I dial in the SW changes for unrelated dev.