fbcrawl: Blocked after crawling
Don’t use your personal facebook profile to crawl
Hello, We’re starting to experience some blockage by facebook. After a certain number of “next pages” have been visited the profile is temporarily suspended for about 1 hour.
If scrapy ends abruptly with this error, your account has been blocked:
File "/fbcrawl/fbcrawl/spiders/fbcrawl.py", line 170, in parse_page
if response.meta['flag'] == self.k and self.k >= self.year:
KeyError: 'flag'
This prevents you from visiting any page during the blocking period from mbasic.facebook.com, however, it seems that the blockage is not fully enforced on m.facebook.com and facebook.com you can still access the public pages but not private profiles!

If you are experiencing this issue, in settings.py set:
CONCURRENT_REQUESTS = 1
DOWNLOAD_DELAY = 1
This will force a sequential crawling and will also noticeably slow the crawler down but will assure a better final result. DOWNLOAD_DELAY should be increased if you’re still experiencing blockage. More experiments need to be done to assess the situation, please report here your findings and suggestions
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 15 (1 by maintainers)
hey, add a time.sleep(1) before each “see more”, worked fine for me
@ademjemaa thx for your suggestion! Probably a better way of accomplishing the same thing is to use the DOWNLOAD_DELAY parameter in settings.py. According to scrapy docs the delay time is randomized: