plc4x: [Bug]: plc4j-tools-connection-cache: broken connections remaing in the cache on timeout
What happened?
Summary
When a connection stored in the connection-cache breaks due to a network failure, the connection is not removed from the cache and blocks future uses of the same connection string.
Context
Encountered while trying to solve a similar problem as https://github.com/apache/plc4x/issues/623 in the NiFi integration: When a processor is running and the network connection to the PLC is interrupted, the processors continues to throw errors even if the network connection is restored.
This was brought up in a mail by me (https://lists.apache.org/thread/xm38nh8xzh1m1kj0y74dx0goo81cos82) that sparked a pull request by heyoulin (https://github.com/apache/plc4x/pull/818), an issue by splatch (https://github.com/apache/plc4x/issues/821) and a commit from @chrisdutz (https://github.com/apache/plc4x/commit/9b06c2de0c77a7c1bbcb730bb5285c4435002c93).
The commit (https://github.com/apache/plc4x/commit/9b06c2de0c77a7c1bbcb730bb5285c4435002c93) did not fully addressed the problem, so I bring my attempt to fix it.
Replicate the problem
In order to replicate the problem use the code at the end and follow the steps:
- Start the main below
- Disconnect network
- Wait until errors are shown in the stdout
- You will see the connection is been used after it fails:
16:38:22.486 [main] DEBUG o.a.p.j.u.c.CachedPlcConnectionManager.getConnection:72 - Reusing exising connection
Failed to read due to:
java.util.concurrent.TimeoutExceptio
- Reconnect network. The problem persists.
Possible Solution
The LeasedConnection returns a Future that encapsulates the Future that connects to the PLC. The second one is the one that can mark the connection as invalid for removal. For the moment I have been able to work around this by overriding the get method of the first Future:
@Override
public PlcReadResponse get(long timeout, TimeUnit unit)
throws InterruptedException, ExecutionException, TimeoutException {
try {
return super.get(timeout, unit);
} catch (TimeoutException e) {
future.completeExceptionally(e);
throw e;
}
}
You can see my solution in the zylklab fork (https://github.com/zylklab/plc4x/tree/Fix/nifi-integration-timeout). If you could give me some feedback I would like to make this into a PR as soon as posible.
public class ManualTest {
public static void main(String[] args) throws InterruptedException {
CachedPlcConnectionManager cachedPlcConnectionManager = CachedPlcConnectionManager.getBuilder(new DefaultPlcDriverManager()).withMaxLeaseTime(Duration.ofMinutes(5)).build();
for (int i = 0; i < 100; i++){
Thread.sleep(1000);
try (PlcConnection connection = cachedPlcConnectionManager.getConnection("s7://10.105.143.7:102?remote-rack=0&remote-slot=1&controller-type=S7_1200")) {
PlcReadRequest.Builder plcReadRequestBuilder = connection.readRequestBuilder();
plcReadRequestBuilder.addTagAddress("foo", "%DB1:DBX0.0:BOOL");
PlcReadRequest plcReadRequest = plcReadRequestBuilder.build();
PlcReadResponse plcReadResponse = plcReadRequest.execute().get(1000, TimeUnit.MILLISECONDS);
System.out.printf("Run %d: Value: %f%n", i, plcReadResponse.getFloat("foo"));
} catch (Exception e) {
System.out.println("Failed to read due to: ");
e.printStackTrace();
}
}
}
}
Version
v0.11.0-SNAPSHOT
Programming Languages
- plc4j
- plc4go
- plc4c
- plc4net
Protocols
- AB-Ethernet
- ADS /AMS
- BACnet/IP
- CANopen
- DeltaV
- DF1
- EtherNet/IP
- Firmata
- KNXnet/IP
- Modbus
- OPC-UA
- S7
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 25 (23 by maintainers)
Could you folks prease try this again and give me feedback, if this issue is now fixed?