icrawler: TypeError: 'NoneType' object is not iterable

For me the GoogleImageCrawler of icrawler doesn’t work anymore. I updated the user agent in crawler.py since that seemed to work in the past, but no luck here. I tried it both on python 3.8 and 3.9 (apple silicon, but shouldn’t matter). Again, it worked in the past (like 3-6 months ago).

Even the simple example

from icrawler.builtin import GoogleImageCrawler
searchterm = 'ANY SEARCHTERM'
google_crawler = GoogleImageCrawler(storage={'root_dir': 'test'})
google_crawler.crawl(keyword=searchterm, max_num=1)

gives

...lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File ".../python3.8/site-packages/icrawler-0.6.6-py3.8.egg/icrawler/parser.py", line 104, in worker_exec
    for task in self.parse(response, **kwargs):
TypeError: 'NoneType' object is not iterable

Does anyone know how to fix this, or have the same issue in July 2022?

About this issue

Original URL
State: open
Created 2 years ago
Comments: 18

Commits related to this issue

Update google.py https://github.com/hellock/icrawler/issues/107 — committed to jfreyberg/icrawler by jfreyberg 2 years ago

Most upvoted comments

I have the same problem, I tried to roll back to the old version, it does not help. I saw that such a problem was already with this library i am executing the following code self.__search_word = ‘cat’ self.__count = 10

`google = GoogleImageCrawler(storage={“root_dir”: path}) filters = dict( size=‘>1024x768’, date=((2020, 1, 1), (2021, 11, 30)))

        google.crawl(keyword=self.__search_word, max_num=self.__count, filters=filters, offset=rnd.randint(0, 500))
    except Exception as _ex:
        logger.error("Something happened when uploading images", _ex)`

at the output I get 2022-07-10 10:36:53,907 - INFO - icrawler.crawler - start crawling… 2022-07-10 10:36:53,907 - INFO - icrawler.crawler - starting 1 feeder threads… 2022-07-10 10:36:53,914 - INFO - feeder - thread feeder-001 exit 2022-07-10 10:36:53,915 - INFO - icrawler.crawler - starting 1 parser threads… 2022-07-10 10:36:53,916 - INFO - icrawler.crawler - starting 1 downloader threads… 2022-07-10 10:36:54,117 - INFO - parser - parsing result page https://www.google.com/search?q=apex&ijn=1&start=150&tbs=isz%3Alt%2Cislt%3Axga%2Ccdr%3A1%2Ccd_min%3A01%2F01%2F2020%2Ccd_max%3A11%2F30%2F2021&tbm=isch Traceback (most recent call last): File “C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\threading.py”, line 870, in run self._target(*self._args, **self._kwargs) File “C:\Users\Administrator\PycharmProjects\BPG\venv37\lib\site-packages\icrawler\parser.py”, line 104, in worker_exec for task in self.parse(response, **kwargs): TypeError: ‘NoneType’ object is not iterable python-BaseException

Kir-1 on Jul 10, 2022

The problem is in builtin/google.py replace the parse function around line 148 with this…

def parse(self, response):
    soup = BeautifulSoup(
        response.content.decode('utf-8', 'ignore'), 'lxml')
    images = soup.find_all(name='img')
    uris = []
    for img in images:
        if img.has_attr('src'):
            uris.append(img['src'])
    return [{'file_url': uri} for uri in uris]

philborman on Nov 24, 2022

Also same error! Exception in thread parser-001: Traceback (most recent call last): File “/usr/lib/python3.10/threading.py”, line 1016, in _bootstrap_inner self.run() File “/usr/lib/python3.10/threading.py”, line 953, in run self._target(*self._args, **self._kwargs) File “/home/younus/.local/lib/python3.10/site-packages/icrawler/parser.py”, line 94, in worker_exec for task in self.parse(response, **kwargs): TypeError: ‘NoneType’ object is not iterable 2023-07-25 09:04:15,067 - INFO - downloader - no more download task for thread downloader-001 2023-07-25 09:04:15,069 - INFO - downloader - thread downloader-001 exit 2023-07-25 09:04:15,073 - INFO - icrawler.crawler - Crawling task done!

Broke this recently… tried different python versions but still no progress. Please help to fix this.

megayounus786 on Jul 25, 2023

The problem is in builtin/google.py replace the parse function around line 148 with this…

def parse(self, response):
    soup = BeautifulSoup(
        response.content.decode('utf-8', 'ignore'), 'lxml')
    images = soup.find_all(name='img')
    uris = []
    for img in images:
        if img.has_attr('src'):
            uris.append(img['src'])
    return [{'file_url': uri} for uri in uris]

Much better, but it still doesn’t work. It generates errors of the following type:

2022-12-15 09:27:02,994 - ERROR - downloader - Exception caught when downloading file //www.gstatic.com/images/branding/googlelogo/svg/googlelogo_clr_160x56px.svg, error: ‘’, remaining retry times: 2 2022-12-15 09:27:02,996 - ERROR - downloader - Exception caught when downloading file //www.gstatic.com/images/branding/googlelogo/svg/googlelogo_clr_160x56px.svg, error: ‘’, remaining retry times: 1

Viachaslau85 on Dec 15, 2022