OpenRefine: Allow reconciliation dialog to have option to store errors in cells

Sometimes intermittent internet problems can occur during (long or short) reconciliation batches due to various reasons. (dogs, cats, chairs, etc.) Sometimes these might be HTTP errors where connectivity is still established. Other times these errors might be due to SocketExceptions or networking abnormalities that prevent TCP packets (and hence HTTP) from being sent or received.

Proposed solution

It would be nice for Reconciliation dialog similar to Fetch URLs to allow storing errors in the cells. Maybe default is enabled? That way they could be retrieved with cell.errorMessage and filtered out for retry from those that did have successful batch queries returned.

Alternatives considered

Maybe it needs to be a deeper field added to cell.recon ? or possibly just at cell.value level? But it would be nice to have 1 place for cell errors, so I think cell.errorMessage might be better for now. Up to the devs on this. As long as it’s documented at Cell or Recon: https://github.com/OpenRefine/OpenRefine/wiki/Variables#cell

Additional context

Related to #2229 but this issue specifically asks to store errors as part of other improvements mentioned in #2229 for unreliable connections.

OpenRefine log example at 80% completed recon during a connectivity issue that occurred (my chair backed up too much and accidentally pulled the network plug):

        at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.net.PlainSocketImpl.connect(Unknown Source)
        at java.net.SocksSocketImpl.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:339)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
        at com.google.refine.model.recon.StandardReconConfig.batchRecon(StandardReconConfig.java:478)
        at com.google.refine.operations.recon.ReconOperation$ReconProcess.run(ReconOperation.java:282)
        at java.lang.Thread.run(Unknown Source)
12:47:53.233 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (77ms)
12:47:53.233 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.249 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (16ms)
12:47:53.249 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.264 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (15ms)
12:47:53.264 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.279 [    refine-standard-recon] Failed to batch recon with load:
{"q0":{"query":"Edward Jones","type":"Q4830453","type_strict":"should"},"q1":{"query":"Dropbox","type":"Q4830453","type_strict":"should"},"q2":{"query":"Novo Nordisk","type":"Q4830453","type_strict":"should"},"q3":{"query":"Nvidia","type":"Q4830453","type_strict":"should"},"q4":{"query":"Farmers Insurance","type":"Q4830453","type_strict":"should"},"q5":{"query":"CHG Healthcare","type":"Q4830453","type_strict":"should"},"q6":{"query":"USAA","type":"Q4830453","type_strict":"should"},"q7":{"query":"Burns & McDonnell","type":"Q4830453","type_strict":"should"},"q8":{"query":"Cisco","type":"Q4830453","type_strict":"should"},"q9":{"query":"Kettering Health Network","type":"Q4830453","type_strict":"should"}} (15ms)
java.net.SocketException: Network is unreachable: connect
        at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.net.PlainSocketImpl.connect(Unknown Source)
        at java.net.SocksSocketImpl.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:339)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
        at com.google.refine.model.recon.StandardReconConfig.batchRecon(StandardReconConfig.java:478)
        at com.google.refine.operations.recon.ReconOperation$ReconProcess.run(ReconOperation.java:282)
        at java.lang.Thread.run(Unknown Source)
12:47:53.357 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (78ms)
12:47:53.357 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.372 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (15ms)
12:47:53.373 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (1ms)
12:47:53.388 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (15ms)
12:47:53.388 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.403 [    refine-standard-recon] Failed to batch recon with load:
{"q0":{"query":"Publix Super Markets","type":"Q4830453","type_strict":"should"},"q1":{"query":"Cadence","type":"Q4830453","type_strict":"should"},"q2":{"query":"Workday","type":"Q4830453","type_strict":"should"},"q3":{"query":"Progressive Insurance","type":"Q4830453","type_strict":"should"},"q4":{"query":"Adobe","type":"Q4830453","type_strict":"should"},"q5":{"query":"Allianz Life Insurance Co. of North America","type":"Q4830453","type_strict":"should"},"q6":{"query":"Atlantic Health System","type":"Q4830453","type_strict":"should"},"q7":{"query":"Bain & Co.","type":"Q4830453","type_strict":"should"},"q8":{"query":"Sheetz","type":"Q4830453","type_strict":"should"},"q9":{"query":"Hilton","type":"Q4830453","type_strict":"should"}} (15ms)
java.net.SocketException: Network is unreachable: connect
        at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.net.PlainSocketImpl.connect(Unknown Source)
        at java.net.SocksSocketImpl.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:339)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
        at com.google.refine.model.recon.StandardReconConfig.batchRecon(StandardReconConfig.java:478)
        at com.google.refine.operations.recon.ReconOperation$ReconProcess.run(ReconOperation.java:282)
        at java.lang.Thread.run(Unknown Source)
12:47:53.480 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (77ms)
12:47:53.480 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.495 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (15ms)
12:47:53.496 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (1ms)
12:47:53.511 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (15ms)
12:47:53.511 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.526 [    refine-standard-recon] Failed to batch recon with load:
{"q0":{"query":"World Wide Technology","type":"Q4830453","type_strict":"should"},"q1":{"query":"KPMG","type":"Q4830453","type_strict":"should"},"q2":{"query":"Custom Ink","type":"Q4830453","type_strict":"should"},"q3":{"query":"The Cheesecake Factory","type":"Q4830453","type_strict":"should"},"q4":{"query":"Delta Air Lines","type":"Q4830453","type_strict":"should"},"q5":{"query":"W. L. Gore & Associates","type":"Q4830453","type_strict":"should"},"q6":{"query":"Baptist Health South Florida","type":"Q4830453","type_strict":"should"},"q7":{"query":"Nationwide","type":"Q4830453","type_strict":"should"},"q8":{"query":"Four Seasons Hotels & Resorts","type":"Q4830453","type_strict":"should"},"q9":{"query":"Camden Property Trust","type":"Q4830453","type_strict":"should"}} (15ms)
java.net.SocketException: Network is unreachable: connect
        at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.net.PlainSocketImpl.connect(Unknown Source)
        at java.net.SocksSocketImpl.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:339)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
        at com.google.refine.model.recon.StandardReconConfig.batchRecon(StandardReconConfig.java:478)
        at com.google.refine.operations.recon.ReconOperation$ReconProcess.run(ReconOperation.java:282)
        at java.lang.Thread.run(Unknown Source)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 15 (15 by maintainers)

Commits related to this issue

Most upvoted comments

Just answered a question relating to a failed reconciliation process at https://forum.openrefine.org/t/what-does-it-mean-when-something-is-unreconciled/113/4 - the lack of feedback to the user on what’s happened has definitely caused some confusion here.

failed to batch recon with load: is just an error prefix and not useful. It’s like saying just ERROR. The query payload is also not useful I think and should be displayed only if a debug option is enabled.

What is more useful is the deeper exception.

As far as your work displaying errors, I think this is sufficient for now to at least surface errors and present them to the user. We can address more useful errors in a case by case basis in a new issue.

I still think fixing https://github.com/OpenRefine/OpenRefine/issues/2229 would provide a bigger user benefit.

@tfmorris Both are worth doing: retrying queries will not remove the need for proper error reporting, since some errors can persist no matter how often they are retried.

The other issue mentioned above, https://github.com/OpenRefine/OpenRefine/issues/87, is marked as closed by an unmerged (and unreviewed) commit, which is odd. A mistake, perhaps?

No, that’s intentional - we can discuss this more in the forum thread about releasing 4.0-alpha2.

what do we do after storing the error in cell.recon.error?do we display it in the cell instead of leaving the cell unreconciled?

I would render it similarly to an unmatched cell: we display the cell value, and underneath the reconciliation error (in place of the reconciliation candidates). It’s probably useful to still leave the option to match the cell manually - we can tweak the rendering after testing, I would say.