ray: [Core] log_to_driver=False does not suppress worker errors in ipython
What happened + What you expected to happen
When a ray task raises an error in an ipython session, I get an “Unhandled error” message in my session output after some delay (about 1 second). If I initialize ray with log_to_driver=False I still get the error. Setting environment variable RAY_IGNORE_UNHANDLED_ERRORS=1 does fix the error, but I think that log_to_driver should suppress all ray task errors. I know it suppresses warnings.
Error message:
2022-08-31 14:39:17,467 ERROR worker.py:399 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::raise_error() (pid=55178, ip=127.0.0.1)
File "<ipython-input-1-4cb397045297>", line 8, in raise_error
RuntimeError
This error is relevant to Modin, which saves object ids from remote function calls. Modin users in ipython get spammed with ray task errors in addition to getting the nicer exception in the main thread caused by finally ray.geting the oid from the task that raised the error. It would be excellent for the Modin user experience if there were a way to suppress the errors. RAY_IGNORE_UNHANDLED_ERRORS=1 doesn’t work because we shouldn’t update the user’s environment. Suppressing errors with log_to_driver=False would be great.
Versions / Dependencies
ray 2.0.0 macOS Monterey 12.4 python 3.10.4 ipython 8.4.0
Reproduction script
import ray
import time
ray.init(log_to_driver=False)
@ray.remote
def raise_error():
raise RuntimeError()
oid = raise_error.remote()
time.sleep(5)
Issue Severity
Low: It annoys or frustrates me.
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 16 (10 by maintainers)
Case 1: an exception object is GCed before it’s accessed/handled, then it’s an unhandled error so we will have the error message regardless of interactive or script mode. (@scv119 this is the case you described)
Case 2: an exception object is accessed/handled within 5s after it’s created, then we won’t have the error message in either mode.
Case 3: an exception object is accessed/handled 5s after it’s created, then interactive mode will show the error message but script mode won’t. (this is the case @mvashishtha encountered and we should see how to fix).