trafilatura: Celery error with v1.2.1: ValueError: signal only works in main thread
Having version 1.2.1 it is not possible to launch trafilatura extraction in the async task like celery.
https://github.com/adbar/trafilatura/blob/1bb5fee6a4812e53b6597053c25efde995174d79/trafilatura/core.py#L982
It would be better to have HAS_SIGNAL
as config variable, and not hardcoded value
celery_1 | text = trafilatura.extract(
celery_1 | File "/usr/local/lib/python3.8/site-packages/trafilatura/core.py", line 982, in extract
celery_1 | signal(SIGALRM, timeout_handler)
celery_1 | File "/usr/local/lib/python3.8/signal.py", line 47, in signal
celery_1 | handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
celery_1 | ValueError: signal only works in main thread
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (7 by maintainers)
Commits related to this issue
- extraction: make signal triggers optional (#202) — committed to adbar/trafilatura by adbar 2 years ago
If you are still struggling with this issue then here is my solution:
from trafilatura.settings import use_config
import trafilatura
config = use_config()
config.set("DEFAULT", "EXTRACTION_TIMEOUT", "0")
downloaded = trafilatura.fetch_url('https://the-URL-you-want-to-extract')
output = trafilatura.extract(downloaded, config=config)
I was struggling with Flask App and not getting data from cfg file (don’t know why, didn’t get the time to investigate)
Hi @mikii121, please use the latest version and a specially crafted settings file:
EXTRACTION_TIMEOUT
to 0 will disable signalextract(downloaded, settingsfile="myfile.cfg")
For more see extraction settings.
Hi @alex-bender, thanks for your feedback. Can you try something similar to this solution?
If it is too much of a problem I could make use of
signal
optional, anyone here experiencing the same problem?Confirming that using below fixed the ValueError issue
extract(downloaded, settingsfile="myfile.cfg")
Yes, it seems to work. Thanks all.
On Mon, Aug 1, 2022 at 2:25 PM Adrien Barbaresi @.***> wrote:
– Spirovski Bozidar
Thanks both, I’ll try it over the weekend and post results.
Hi @adbar. Thanks for your answer. It works for me. 😉
I have similar setup, the only one difference is absence of
--master --processes 4 --threads 2
, so going to try that. Will let you know