interpret: Segmentation Fault

This issue may be related to the issue https://github.com/interpretml/interpret/issues/435 .

I was trying to run a model_ExpBoostReg.fit(X, y) when I have got the following segfault (full notebook available here ):

---------------------------------------------------------------------------
TerminatedWorkerError                     Traceback (most recent call last)
Cell In[12], line 1
----> 1 model_ExpBoostReg.fit(X, y)

File /opt/conda/lib/python3.11/site-packages/interpret/glassbox/_ebm/_ebm.py:848, in EBMModel.fit(self, X, y, sample_weight, init_score)
    821         early_stopping_rounds_local = 0
    823     parallel_args.append(
    824         (
    825             dataset,
   (...)
    845         )
    846     )
--> 848 results = provider.parallel(boost, parallel_args)
    850 # let python reclaim the dataset memory via reference counting
    851 del parallel_args  # parallel_args holds references to dataset, so must be deleted

File /opt/conda/lib/python3.11/site-packages/interpret/provider/_compute.py:19, in JobLibProvider.parallel(self, compute_fn, compute_args_iter)
     18 def parallel(self, compute_fn, compute_args_iter):
---> 19     results = Parallel(n_jobs=self.n_jobs)(
     20         delayed(compute_fn)(*args) for args in compute_args_iter
     21     )
     22     return results

File /opt/conda/lib/python3.11/site-packages/joblib/parallel.py:1098, in Parallel.__call__(self, iterable)
   1095     self._iterating = False
   1097 with self._backend.retrieval_context():
-> 1098     self.retrieve()
   1099 # Make sure that we get a last message telling us we are done
   1100 elapsed_time = time.time() - self._start_time

File /opt/conda/lib/python3.11/site-packages/joblib/parallel.py:975, in Parallel.retrieve(self)
    973 try:
    974     if getattr(self._backend, 'supports_timeout', False):
--> 975         self._output.extend(job.get(timeout=self.timeout))
    976     else:
    977         self._output.extend(job.get())

File /opt/conda/lib/python3.11/site-packages/joblib/_parallel_backends.py:567, in LokyBackend.wrap_future_result(future, timeout)
    564 """Wrapper for Future.result to implement the same behaviour as
    565 AsyncResults.get from multiprocessing."""
    566 try:
--> 567     return future.result(timeout=timeout)
    568 except CfTimeoutError as e:
    569     raise TimeoutError from e

File /opt/conda/lib/python3.11/concurrent/futures/_base.py:456, in Future.result(self, timeout)
    454     raise CancelledError()
    455 elif self._state == FINISHED:
--> 456     return self.__get_result()
    457 else:
    458     raise TimeoutError()

File /opt/conda/lib/python3.11/concurrent/futures/_base.py:401, in Future.__get_result(self)
    399 if self._exception:
    400     try:
--> 401         raise self._exception
    402     finally:
    403         # Break a reference cycle with the exception in self._exception
    404         self = None

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

The exit codes of the workers are {SIGSEGV(-11), SIGSEGV(-11), SIGSEGV(-11), SIGSEGV(-11), SIGSEGV(-11), SIGSEGV(-11), SIGSEGV(-11)}

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 20

Most upvoted comments

That’s great @paulbkoch . The SEGFAULT error got removed 😃 I have got another problem, but I will probably address it on a separate issue. Thanks a lot!

Hi @ricardobarroslourenco – I’m pretty sure I know what the problem is. It should only affect the conda build. I’ll put out a new release shortly fixing the issue.

Thanks for all your help!!

I can repro the error! I can take it from here and will let you know what I find.

Hi @ricardobarroslourenco – Thanks for submitting this error! Can you make 2 changes to that script and send me the new log:

Set: native_debug=True instead of False

pass the parameter n_jobs=1 to the ExplainableBoostingRegressor.

Is this a public dataset that I can download?