plc4x: [Bug]: plc4j-tools-connection-cache: broken connections remaing in the cache on timeout

What happened?

Summary

When a connection stored in the connection-cache breaks due to a network failure, the connection is not removed from the cache and blocks future uses of the same connection string.

Context

Encountered while trying to solve a similar problem as https://github.com/apache/plc4x/issues/623 in the NiFi integration: When a processor is running and the network connection to the PLC is interrupted, the processors continues to throw errors even if the network connection is restored.

This was brought up in a mail by me (https://lists.apache.org/thread/xm38nh8xzh1m1kj0y74dx0goo81cos82) that sparked a pull request by heyoulin (https://github.com/apache/plc4x/pull/818), an issue by splatch (https://github.com/apache/plc4x/issues/821) and a commit from @chrisdutz (https://github.com/apache/plc4x/commit/9b06c2de0c77a7c1bbcb730bb5285c4435002c93).

The commit (https://github.com/apache/plc4x/commit/9b06c2de0c77a7c1bbcb730bb5285c4435002c93) did not fully addressed the problem, so I bring my attempt to fix it.

Replicate the problem

In order to replicate the problem use the code at the end and follow the steps:

  1. Start the main below
  2. Disconnect network
  3. Wait until errors are shown in the stdout
  4. You will see the connection is been used after it fails:
16:38:22.486 [main] DEBUG o.a.p.j.u.c.CachedPlcConnectionManager.getConnection:72 - Reusing exising connection
Failed to read due to: 
java.util.concurrent.TimeoutExceptio
  1. Reconnect network. The problem persists.

Possible Solution

The LeasedConnection returns a Future that encapsulates the Future that connects to the PLC. The second one is the one that can mark the connection as invalid for removal. For the moment I have been able to work around this by overriding the get method of the first Future:

@Override
public PlcReadResponse get(long timeout, TimeUnit unit)
        throws InterruptedException, ExecutionException, TimeoutException {
    try {
        return super.get(timeout, unit);
    } catch (TimeoutException e) {
        future.completeExceptionally(e);
        throw e;
    }
}

You can see my solution in the zylklab fork (https://github.com/zylklab/plc4x/tree/Fix/nifi-integration-timeout). If you could give me some feedback I would like to make this into a PR as soon as posible.


public class ManualTest {

    public static void main(String[] args) throws InterruptedException {
        CachedPlcConnectionManager cachedPlcConnectionManager = CachedPlcConnectionManager.getBuilder(new DefaultPlcDriverManager()).withMaxLeaseTime(Duration.ofMinutes(5)).build();
        for (int i = 0; i < 100; i++){
            Thread.sleep(1000);
            try (PlcConnection connection = cachedPlcConnectionManager.getConnection("s7://10.105.143.7:102?remote-rack=0&remote-slot=1&controller-type=S7_1200")) {
                PlcReadRequest.Builder plcReadRequestBuilder = connection.readRequestBuilder();
                plcReadRequestBuilder.addTagAddress("foo", "%DB1:DBX0.0:BOOL");
                PlcReadRequest plcReadRequest = plcReadRequestBuilder.build();
                
                PlcReadResponse plcReadResponse =  plcReadRequest.execute().get(1000, TimeUnit.MILLISECONDS);
                System.out.printf("Run %d: Value: %f%n", i, plcReadResponse.getFloat("foo"));
            } catch (Exception e) {
                System.out.println("Failed to read due to: ");
                e.printStackTrace();
            }
        }
    }
}

Version

v0.11.0-SNAPSHOT

Programming Languages

  • plc4j
  • plc4go
  • plc4c
  • plc4net

Protocols

  • AB-Ethernet
  • ADS /AMS
  • BACnet/IP
  • CANopen
  • DeltaV
  • DF1
  • EtherNet/IP
  • Firmata
  • KNXnet/IP
  • Modbus
  • OPC-UA
  • S7

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 25 (23 by maintainers)

Most upvoted comments

Could you folks prease try this again and give me feedback, if this issue is now fixed?