OpenRefine: Allow reconciliation dialog to have option to store errors in cells
Sometimes intermittent internet problems can occur during (long or short) reconciliation batches due to various reasons. (dogs, cats, chairs, etc.) Sometimes these might be HTTP errors where connectivity is still established. Other times these errors might be due to SocketExceptions or networking abnormalities that prevent TCP packets (and hence HTTP) from being sent or received.
Proposed solution
It would be nice for Reconciliation dialog similar to Fetch URLs to allow storing errors in the cells. Maybe default is enabled?
That way they could be retrieved with cell.errorMessage and filtered out for retry from those that did have successful batch queries returned.
Alternatives considered
Maybe it needs to be a deeper field added to cell.recon ? or possibly just at cell.value level? But it would be nice to have 1 place for cell errors, so I think cell.errorMessage might be better for now. Up to the devs on this. As long as it’s documented at Cell or Recon: https://github.com/OpenRefine/OpenRefine/wiki/Variables#cell
Additional context
Related to #2229 but this issue specifically asks to store errors as part of other improvements mentioned in #2229 for unreliable connections.
OpenRefine log example at 80% completed recon during a connectivity issue that occurred (my chair backed up too much and accidentally pulled the network plug):
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:339)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at com.google.refine.model.recon.StandardReconConfig.batchRecon(StandardReconConfig.java:478)
at com.google.refine.operations.recon.ReconOperation$ReconProcess.run(ReconOperation.java:282)
at java.lang.Thread.run(Unknown Source)
12:47:53.233 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (77ms)
12:47:53.233 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.249 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (16ms)
12:47:53.249 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.264 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (15ms)
12:47:53.264 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.279 [ refine-standard-recon] Failed to batch recon with load:
{"q0":{"query":"Edward Jones","type":"Q4830453","type_strict":"should"},"q1":{"query":"Dropbox","type":"Q4830453","type_strict":"should"},"q2":{"query":"Novo Nordisk","type":"Q4830453","type_strict":"should"},"q3":{"query":"Nvidia","type":"Q4830453","type_strict":"should"},"q4":{"query":"Farmers Insurance","type":"Q4830453","type_strict":"should"},"q5":{"query":"CHG Healthcare","type":"Q4830453","type_strict":"should"},"q6":{"query":"USAA","type":"Q4830453","type_strict":"should"},"q7":{"query":"Burns & McDonnell","type":"Q4830453","type_strict":"should"},"q8":{"query":"Cisco","type":"Q4830453","type_strict":"should"},"q9":{"query":"Kettering Health Network","type":"Q4830453","type_strict":"should"}} (15ms)
java.net.SocketException: Network is unreachable: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:339)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at com.google.refine.model.recon.StandardReconConfig.batchRecon(StandardReconConfig.java:478)
at com.google.refine.operations.recon.ReconOperation$ReconProcess.run(ReconOperation.java:282)
at java.lang.Thread.run(Unknown Source)
12:47:53.357 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (78ms)
12:47:53.357 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.372 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (15ms)
12:47:53.373 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (1ms)
12:47:53.388 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (15ms)
12:47:53.388 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.403 [ refine-standard-recon] Failed to batch recon with load:
{"q0":{"query":"Publix Super Markets","type":"Q4830453","type_strict":"should"},"q1":{"query":"Cadence","type":"Q4830453","type_strict":"should"},"q2":{"query":"Workday","type":"Q4830453","type_strict":"should"},"q3":{"query":"Progressive Insurance","type":"Q4830453","type_strict":"should"},"q4":{"query":"Adobe","type":"Q4830453","type_strict":"should"},"q5":{"query":"Allianz Life Insurance Co. of North America","type":"Q4830453","type_strict":"should"},"q6":{"query":"Atlantic Health System","type":"Q4830453","type_strict":"should"},"q7":{"query":"Bain & Co.","type":"Q4830453","type_strict":"should"},"q8":{"query":"Sheetz","type":"Q4830453","type_strict":"should"},"q9":{"query":"Hilton","type":"Q4830453","type_strict":"should"}} (15ms)
java.net.SocketException: Network is unreachable: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:339)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at com.google.refine.model.recon.StandardReconConfig.batchRecon(StandardReconConfig.java:478)
at com.google.refine.operations.recon.ReconOperation$ReconProcess.run(ReconOperation.java:282)
at java.lang.Thread.run(Unknown Source)
12:47:53.480 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (77ms)
12:47:53.480 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.495 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (15ms)
12:47:53.496 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (1ms)
12:47:53.511 [..mpl.execchain.RetryExec] I/O exception (java.net.SocketException) caught when processing request to {s}->https://wdreconcile.toolforge.org:443: Network is unreachable: connect (15ms)
12:47:53.511 [..mpl.execchain.RetryExec] Retrying request to {s}->https://wdreconcile.toolforge.org:443 (0ms)
12:47:53.526 [ refine-standard-recon] Failed to batch recon with load:
{"q0":{"query":"World Wide Technology","type":"Q4830453","type_strict":"should"},"q1":{"query":"KPMG","type":"Q4830453","type_strict":"should"},"q2":{"query":"Custom Ink","type":"Q4830453","type_strict":"should"},"q3":{"query":"The Cheesecake Factory","type":"Q4830453","type_strict":"should"},"q4":{"query":"Delta Air Lines","type":"Q4830453","type_strict":"should"},"q5":{"query":"W. L. Gore & Associates","type":"Q4830453","type_strict":"should"},"q6":{"query":"Baptist Health South Florida","type":"Q4830453","type_strict":"should"},"q7":{"query":"Nationwide","type":"Q4830453","type_strict":"should"},"q8":{"query":"Four Seasons Hotels & Resorts","type":"Q4830453","type_strict":"should"},"q9":{"query":"Camden Property Trust","type":"Q4830453","type_strict":"should"}} (15ms)
java.net.SocketException: Network is unreachable: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:339)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at com.google.refine.model.recon.StandardReconConfig.batchRecon(StandardReconConfig.java:478)
at com.google.refine.operations.recon.ReconOperation$ReconProcess.run(ReconOperation.java:282)
at java.lang.Thread.run(Unknown Source)
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 15 (15 by maintainers)
Commits related to this issue
- #3194 removes repetitive lines from the test — committed to ayushrai206/OpenRefine by ayushrai206 10 months ago
- #3194 updates the front end code to store error in the cells — committed to ayushrai206/OpenRefine by ayushrai206 10 months ago
- #3194 removes repetitive lines from the test — committed to ayushrai206/OpenRefine by ayushrai206 10 months ago
- #3194 updates the front end code to store error in the cells — committed to ayushrai206/OpenRefine by ayushrai206 10 months ago
- Store reconciliation errors in cell.recon.error (#6006) * adds the error variable * uses formatter and fixes formatting errors * tests for the errors in the batchrecon method * #3194 removes... — committed to OpenRefine/OpenRefine by ayushrai206 8 months ago
Just answered a question relating to a failed reconciliation process at https://forum.openrefine.org/t/what-does-it-mean-when-something-is-unreconciled/113/4 - the lack of feedback to the user on what’s happened has definitely caused some confusion here.
failed to batch recon with load:is just an error prefix and not useful. It’s like saying just ERROR. The query payload is also not useful I think and should be displayed only if a debug option is enabled.What is more useful is the deeper exception.
As far as your work displaying errors, I think this is sufficient for now to at least surface errors and present them to the user. We can address more useful errors in a case by case basis in a new issue.
@tfmorris Both are worth doing: retrying queries will not remove the need for proper error reporting, since some errors can persist no matter how often they are retried.
No, that’s intentional - we can discuss this more in the forum thread about releasing 4.0-alpha2.
I would render it similarly to an unmatched cell: we display the cell value, and underneath the reconciliation error (in place of the reconciliation candidates). It’s probably useful to still leave the option to match the cell manually - we can tweak the rendering after testing, I would say.