icrawler: TypeError: 'NoneType' object is not iterable
For me the GoogleImageCrawler of icrawler doesn’t work anymore. I updated the user agent in crawler.py since that seemed to work in the past, but no luck here. I tried it both on python 3.8 and 3.9 (apple silicon, but shouldn’t matter). Again, it worked in the past (like 3-6 months ago).
Even the simple example
from icrawler.builtin import GoogleImageCrawler
searchterm = 'ANY SEARCHTERM'
google_crawler = GoogleImageCrawler(storage={'root_dir': 'test'})
google_crawler.crawl(keyword=searchterm, max_num=1)
gives
...lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File ".../python3.8/site-packages/icrawler-0.6.6-py3.8.egg/icrawler/parser.py", line 104, in worker_exec
for task in self.parse(response, **kwargs):
TypeError: 'NoneType' object is not iterable
Does anyone know how to fix this, or have the same issue in July 2022?
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 18
Commits related to this issue
- Update google.py https://github.com/hellock/icrawler/issues/107 — committed to jfreyberg/icrawler by jfreyberg 2 years ago
I have the same problem, I tried to roll back to the old version, it does not help. I saw that such a problem was already with this library i am executing the following code self.__search_word = ‘cat’ self.__count = 10
`google = GoogleImageCrawler(storage={“root_dir”: path}) filters = dict( size=‘>1024x768’, date=((2020, 1, 1), (2021, 11, 30)))
at the output I get 2022-07-10 10:36:53,907 - INFO - icrawler.crawler - start crawling… 2022-07-10 10:36:53,907 - INFO - icrawler.crawler - starting 1 feeder threads… 2022-07-10 10:36:53,914 - INFO - feeder - thread feeder-001 exit 2022-07-10 10:36:53,915 - INFO - icrawler.crawler - starting 1 parser threads… 2022-07-10 10:36:53,916 - INFO - icrawler.crawler - starting 1 downloader threads… 2022-07-10 10:36:54,117 - INFO - parser - parsing result page https://www.google.com/search?q=apex&ijn=1&start=150&tbs=isz%3Alt%2Cislt%3Axga%2Ccdr%3A1%2Ccd_min%3A01%2F01%2F2020%2Ccd_max%3A11%2F30%2F2021&tbm=isch Traceback (most recent call last): File “C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\threading.py”, line 870, in run self._target(*self._args, **self._kwargs) File “C:\Users\Administrator\PycharmProjects\BPG\venv37\lib\site-packages\icrawler\parser.py”, line 104, in worker_exec for task in self.parse(response, **kwargs): TypeError: ‘NoneType’ object is not iterable python-BaseException
The problem is in builtin/google.py replace the parse function around line 148 with this…
Also same error! Exception in thread parser-001: Traceback (most recent call last): File “/usr/lib/python3.10/threading.py”, line 1016, in _bootstrap_inner self.run() File “/usr/lib/python3.10/threading.py”, line 953, in run self._target(*self._args, **self._kwargs) File “/home/younus/.local/lib/python3.10/site-packages/icrawler/parser.py”, line 94, in worker_exec for task in self.parse(response, **kwargs): TypeError: ‘NoneType’ object is not iterable 2023-07-25 09:04:15,067 - INFO - downloader - no more download task for thread downloader-001 2023-07-25 09:04:15,069 - INFO - downloader - thread downloader-001 exit 2023-07-25 09:04:15,073 - INFO - icrawler.crawler - Crawling task done!
Broke this recently… tried different python versions but still no progress. Please help to fix this.
Much better, but it still doesn’t work. It generates errors of the following type:
2022-12-15 09:27:02,994 - ERROR - downloader - Exception caught when downloading file //www.gstatic.com/images/branding/googlelogo/svg/googlelogo_clr_160x56px.svg, error: ‘’, remaining retry times: 2 2022-12-15 09:27:02,996 - ERROR - downloader - Exception caught when downloading file //www.gstatic.com/images/branding/googlelogo/svg/googlelogo_clr_160x56px.svg, error: ‘’, remaining retry times: 1