accelerate: how to cleanly exit when using `accelerate launch`
sorry for the silly question, i’ve searched and cannot find the answer
System Info
accelerate 0.16.0 python 3.10.6 torch 2.0.0.dev20230211+cu118 CUDA: 11.8 cuDNN: 8.7.0 os: ubuntu 22.04 running in wsl2 on windows 11
Description
i’m launching python script using
accelerate launch script.py
script itself runs gradioapp web server - and when gradio queues are enabled, there is no clean way to shutdown web server. which means the only way to exit is os._exit(0).
but if i do that, i always get a long traceback from accelerate itself - how to tell accelerate that exit is fine and don’t print traceback?
note: accelerate 0.16 added --quiet flag, it doesn’t change the behavior
Traceback
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/vlado/.local/bin/accelerate:8 in <module> │
│ │
│ 5 from accelerate.commands.accelerate_cli import main │
│ 6 if __name__ == '__main__': │
│ 7 │ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(main()) │
│ 9 │
│ │
│ /home/vlado/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py:45 in main │
│ │
│ 42 │ │ exit(1) │
│ 43 │ │
│ 44 │ # Run │
│ ❱ 45 │ args.func(args) │
│ 46 │
│ 47 │
│ 48 if __name__ == "__main__": │
│ │
│ /home/vlado/.local/lib/python3.10/site-packages/accelerate/commands/launch.py:1097 in │
│ launch_command │
│ │
│ 1094 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │
│ 1095 │ │ sagemaker_launcher(defaults, args) │
│ 1096 │ else: │
│ ❱ 1097 │ │ simple_launcher(args) │
│ 1098 │
│ 1099 │
│ 1100 def main(): │
│ │
│ /home/vlado/.local/lib/python3.10/site-packages/accelerate/commands/launch.py:549 in │
│ simple_launcher │
│ │
│ 546 │ current_env["OMP_NUM_THREADS"] = str(args.num_cpu_threads_per_process) │
│ 547 │ │
│ 548 │ process = subprocess.Popen(cmd, env=current_env) │
│ ❱ 549 │ process.wait() │
│ 550 │ if process.returncode != 0: │
│ 551 │ │ if not args.quiet: │
│ 552 │ │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │
│ │
│ /usr/lib/python3.10/subprocess.py:1207 in wait │
│ │
│ 1204 │ │ if timeout is not None: │
│ 1205 │ │ │ endtime = _time() + timeout │
│ 1206 │ │ try: │
│ ❱ 1207 │ │ │ return self._wait(timeout=timeout) │
│ 1208 │ │ except KeyboardInterrupt: │
│ 1209 │ │ │ # https://bugs.python.org/issue25942 │
│ 1210 │ │ │ # The first keyboard interrupt waits briefly for the child to │
│ │
│ /usr/lib/python3.10/subprocess.py:1941 in _wait │
│ │
│ 1938 │ │ │ │ │ with self._waitpid_lock: │
│ 1939 │ │ │ │ │ │ if self.returncode is not None: │
│ 1940 │ │ │ │ │ │ │ break # Another thread waited. │
│ ❱ 1941 │ │ │ │ │ │ (pid, sts) = self._try_wait(0) │
│ 1942 │ │ │ │ │ │ # Check the pid and loop as waitpid has been known to │
│ 1943 │ │ │ │ │ │ # return 0 even without WNOHANG in odd situations. │
│ 1944 │ │ │ │ │ │ # http://bugs.python.org/issue14396. │
│ │
│ /usr/lib/python3.10/subprocess.py:1899 in _try_wait │
│ │
│ 1896 │ │ def _try_wait(self, wait_flags): │
│ 1897 │ │ │ """All callers to this function MUST hold self._waitpid_lock.""" │
│ 1898 │ │ │ try: │
│ ❱ 1899 │ │ │ │ (pid, sts) = os.waitpid(self.pid, wait_flags) │
│ 1900 │ │ │ except ChildProcessError: │
│ 1901 │ │ │ │ # This happens if SIGCLD is set to be ignored or waiting │
│ 1902 │ │ │ │ # for child processes has otherwise been disabled for our │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyboardInterrupt
Information
- The official example scripts
- My own modified scripts
Tasks
- One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - My own task or dataset (give details below)
Reproduction
- create python script that uses gradio with queue enabled
- run python script using
accelerate.launch - exit python script
Expected behavior
able to exit without long traceback being printed
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 16
Hi @vladmandic, I found another issue requesting the same thing. I’ll look into this here relatively soon (promise not months out like before), and we’ll put in explicit exceptions for KeyboardInterruption if possible. (Again, subprocess has some issues with doing that I saw but there may be a few tricks I can do around it. Won’t 100% promise it’ll be fully possible, but I’ll try and see)
cc @patrickvonplaten for awareness - this issue is open for 7 months without any attention from huggingface team and its a single most common use of accelerate library. its also marked as “enhancement” when from user perspective its anything but an issue.
i am also facing the same issue…
its marked as stale, but there is no progress from huggingface team - this is still open and very much valid!
i tried setting try…except blocks in different places, nothing helped, this stack is from parent
accelerate launchand comes from:and since exit is due to keyboard interrupt (ctrl+c), it will never be exit code 0 - thus triggering printing of full stack.
only solution i found so far is to modify
accelerate/commands/launch.py:simple_launch()function and add try…except block there.