scrapy: Logging can't format stack trace with non-ascii chars on Python 2
Hi,
I experience the same issue as described in #1602. However, I’m not using Django.
The stats look like this:
{'downloader/request_bytes': 47621,
'downloader/request_count': 103,
'downloader/request_method_count/GET': 103,
'downloader/response_bytes': 1162618,
'downloader/response_count': 103,
'downloader/response_status_count/200': 101,
'downloader/response_status_count/302': 2,
'dupefilter/filtered': 2,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 3, 9, 2, 3, 15, 748633),
'httpcache/firsthand': 72,
'httpcache/hit': 31,
'httpcache/miss': 72,
'httpcache/store': 72,
'item_scraped_count': 48,
'log_count/DEBUG': 215,
'log_count/ERROR': 1,
'log_count/INFO': 9,
'memusage/max': 121434112,
'memusage/startup': 69783552,
'mongodb/item_stored_count': 48,
'request_depth_max': 3,
'response_received_count': 101,
'scheduler/dequeued': 102,
'scheduler/dequeued/memory': 102,
'scheduler/enqueued': 102,
'scheduler/enqueued/memory': 102,
'spider_exceptions/AttributeError': 1,
'start_time': datetime.datetime(2018, 3, 9, 2, 0, 52, 510449)}
But there’s no mention about AttributeError (or any other error) in the log.
I’m executing the spiders on Scrapyd instances running in Docker container. This is the first two lines of the log showing components’ versions:
2018-03-09 02:00:52 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: realestate)
2018-03-09 02:00:52 [scrapy.utils.log] INFO: Versions: lxml 4.1.1.0, libxml2 2.9.7, cssselect 1.0.3, parsel 1.3.1, w3lib 1.18.0, Twisted 17.9.0, Python 2.7.12 (default, Nov 20 2017, 18:23:56) - [GCC 5.4.0 20160609], pyOpenSSL 17.5.0 (OpenSSL 1.1.0g 2 Nov 2017), cryptography 2.1.4, Platform Linux-4.4.0-103-generic-x86_64-with-Ubuntu-16.04-xenial
There’s nothing special in the settings.py, but I can provide it if needed.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 20 (9 by maintainers)
Thanks for sharing the code that causes the issue @tlinhart
I managed to reproduce it with the following minimal spider:
@ayushmankoul if you want to help, above is a starting point. thanks
Hey, @cathalgarvey @kmike I am interested in solving this bug.Please help me to fix this bug.Any guidance would be surely helpful to me.Thank You
@cathalgarvey I understand your point, of course, don’t take my upper note too seriously 😃
Hi @ayushmankoul , I only had to add
# -*- coding: utf-8 -*-at the top of your example, but other than that, it fixed the issue.As a note, it is only reproducible on Python 2.
It works fine under Python 3.6.4
The problem is clearly that the non-ascii char can’t be formatted in the stack trace. It can be a bug in Twisted or Python logger.
The relevant part of Scrapy that triggers the bug is at https://github.com/scrapy/scrapy/blob/6cc6bbb5fc5c102271829a554772effb0444023c/scrapy/core/scraper.py#L154-L159
where
failure_to_exc_info()is quire simplistic https://github.com/scrapy/scrapy/blob/6cc6bbb5fc5c102271829a554772effb0444023c/scrapy/utils/log.py#L20-L23