scrapy: [bug?] while True in start_requests(self): make scrapy is unable to consume the yields.
I’m doing
def start_requests(self):
while 1:
words = read_a_list_wanna_crawl()
ips = get_a_ip_list()
if words.count() > 0:
for _, __ in zip(words, ips):
print('do while')
yield scrapy.Request(processed_url, self.html_parse, meta={'proxy': ip, ...})
but when len(zip(words, ips)) == 1, scrapy print do while forever(Infinite loop) and never download any requests. but if len(zip(words, ips)) > 1, scrapy will not go in to infinite loop.
is this a bug? can scrapy handle this?
ps: (another way to solve this) Is it able to create a fake scrapy.Request() that don’t do request but do the callback to finish this kind control flow in scrapy?
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 18 (11 by maintainers)
A good asynchronous solution is to use the
spider_idlesignal to schedule in batches:when you yield a request, scrapy will put the request object into a schedule pool, and the scheduler will do this request concurrently when there is enough request objects or some tiny time is up.