ray: [core] "Windows fatal exception: access violation" cluttering terminal

What is the problem?

I am using Ray 1.1.0 with Python 3.7.6 to run an ActorPool. Each actor needs access to it’s own copy of a java virtual machine (created using jpype, which is a dependency of another package which is used by the Actors, but it seems to be the root of this issue). Ray seems to handle this just fine, however, it prints many lines of errors to the terminal, all of which are repeats of:

(pid=18064) Windows fatal exception: access violation (pid=18064) (pid=18064) Stack (most recent call first): (pid=18064) File “C:\ProgramData\Anaconda3\lib\site-packages\jpype_core.py”, line 222 in startJVM (pid=18064) File “c:\Users\Kursti\Documents\Python\ray_access_violation.py”, line 15 in init (pid=18064) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\function_manager.py”, line 556 in actor_method_executor (pid=18064) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\worker.py”, line 383 in main_loop (pid=18064) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\workers/default_worker.py”, line 181 in <module> (pid=11676) Windows fatal exception: access violation

Again, the code we’re running seems to work fine, but the terminal clutter makes it challenging to work with our code. This issue has also come up intermittently without using jpype, but is not reproducible. Any idea how we can fix this problem?

Reproduction (REQUIRED)

import psutil
import ray
import jpype

@ray.remote
class ObjectiveFunc(object):
    def __init__(self):
        self.java = jpype.startJVM()

class RayMap(object):
    def __init__(self, num_workers):
        self.workers = []
        for _ in range(num_workers):
            self.workers.append(ObjectiveFunc.remote())

num_cpus = psutil.cpu_count(logical=False)
ray.init(num_cpus=num_cpus, include_dashboard= True)
rm = RayMap(4)

If the code snippet cannot be run by itself, the issue will be closed with “needs-repro-script”.

  • [X ] I have verified my script runs in a clean environment and reproduces the issue.
  • [X ] I have verified the issue also occurs with the latest wheels.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 16 (9 by maintainers)

Most upvoted comments

Hi Evan,

A partial solution to this problem is to use ray.init(log_to_driver=False) when initializing your ray cluster. This got rid of some of the mess in the terminal due to the particular library I was using (jpype), but the messages still show up sometimes related to other things (seems random). Wish I could help more, and if you find a solution please post to Github!

Thanks, Avi

On Fri, Feb 5, 2021 at 4:26 AM Evan Hu (YiFan Hu) notifications@github.com wrote:

any idea how to solve this? I have similar problems when I use deap package the code seems to run fine but it keeps yelled “fatal” exception and it seems to been printed out, not a real exception

@ray.remote

class Ray_Deap_Map():

def __init__(self, creator_setup=None, pset_creator = None):

    # issue 946? Ensure non trivial startup to prevent bad load balance across a cluster

    # sleep(0.01)



    # recreate scope from global

    # For GA no need to provide pset_creator. Both needed for GP

    self.creator_setup = creator_setup

    self.psetCreator = pset_creator

    if creator_setup is not None:

        self.creator_setup()

        self.psetCreator()



def ray_remote_eval_batch(self, f, iterable):

    # iterable, id_ = zipped_input

    # attach id so we can reorder the batches

    return [f(i) for i in iterable]

def ray_deap_map(func, pop, creator_setup, pset_creator):

n_workers = int(ray.cluster_resources()['CPU'])

if n_workers == 1:

    results = list(map(func, pop)) #forced eval to time it

else:

    # many workers

    if len(pop) < n_workers:

        n_workers = len(pop)

    else:

        n_workers = n_workers



n_per_batch = int(len(pop)/n_workers) + 1

batches = [pop[i:i + n_per_batch] for i in range(0, len(pop), n_per_batch)]

actors = [Ray_Deap_Map.remote(creator_setup, pset_creator) for _ in range(n_workers)]

result_ids = [a.ray_remote_eval_batch.remote(func, b) for a, b in zip(actors,batches)]

results = ray.get(result_ids)



return sum(results, [])

(pid=31996) Windows fatal exception: access violation (pid=31996) (pid=21820) Windows fatal exception: access violation (pid=21820) (pid=31372) Windows fatal exception: access violation (pid=31372) (pid=24640) Windows fatal exception: access violation (pid=24640) (pid=31380) Windows fatal exception: access violation (pid=31380) (pid=15396) Windows fatal exception: access violation (pid=15396) (pid=21660) Windows fatal exception: access violation (pid=21660) (pid=21976) Windows fatal exception: access violation (pid=21976) (pid=29076) Windows fatal exception: access violation (pid=29076) (pid=32212) Windows fatal exception: access violation (pid=32212) (pid=25964) Windows fatal exception: access violation (pid=25964) (pid=17224) Windows fatal exception: access violation (pid=17224) (pid=31964) Windows fatal exception: access violation (pid=31964) (pid=25632) Windows fatal exception: access violation (pid=25632) (pid=27112) Windows fatal exception: access violation (pid=27112) (pid=32620) Windows fatal exception: access violation

And then at some point, it will crash with

2021-02-05 17:24:29,648 WARNING worker.py:1034 – The log monitor on node DESKTOP-QJDSQ0R failed with the following error: OSError: [WinError 87] 參數錯誤。

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File “C:\Users\eiahb.conda\envs\env_genetic_programming\lib\site-packages\ray\log_monitor.py”, line 354, in log_monitor.run() File “C:\Users\eiahb.conda\envs\env_genetic_programming\lib\site-packages\ray\log_monitor.py”, line 275, in run self.open_closed_files() File “C:\Users\eiahb.conda\envs\env_genetic_programming\lib\site-packages\ray\log_monitor.py”, line 164, in open_closed_files self.close_all_files() File “C:\Users\eiahb.conda\envs\env_genetic_programming\lib\site-packages\ray\log_monitor.py”, line 102, in close_all_files os.kill(file_info.worker_pid, 0) SystemError: returned a result with an error set

forrtl: error (200): program aborting due to control-C event Image PC Routine Line Source libifcoremd.dll 00007FFDC0AE3B58 Unknown Unknown Unknown KERNELBASE.dll 00007FFE221862A3 Unknown Unknown Unknown KERNEL32.DLL 00007FFE24217C24 Unknown Unknown Unknown ntdll.dll 00007FFE2470D4D1 Unknown Unknown Unknown Windows fatal exception: access violation

please do help

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/13511#issuecomment-773909246, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG7IVY4OIZXSUXAONTGVVJ3S5O2U3ANCNFSM4WGRFOIQ .

Hmm, I think you might want to look at ray/services.py?