fbcrawl: Blocked after crawling

Don’t use your personal facebook profile to crawl

Hello, We’re starting to experience some blockage by facebook. After a certain number of “next pages” have been visited the profile is temporarily suspended for about 1 hour.

If scrapy ends abruptly with this error, your account has been blocked:

  File "/fbcrawl/fbcrawl/spiders/fbcrawl.py", line 170, in parse_page
    if response.meta['flag'] == self.k and self.k >= self.year:
KeyError: 'flag'

This prevents you from visiting any page during the blocking period from mbasic.facebook.com, however, it seems that the blockage is not fully enforced on m.facebook.com and facebook.com you can still access the public pages but not private profiles!

Screenshot_20190425_163240

If you are experiencing this issue, in settings.py set:

CONCURRENT_REQUESTS = 1
DOWNLOAD_DELAY = 1

This will force a sequential crawling and will also noticeably slow the crawler down but will assure a better final result. DOWNLOAD_DELAY should be increased if you’re still experiencing blockage. More experiments need to be done to assess the situation, please report here your findings and suggestions

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 15 (1 by maintainers)

Most upvoted comments

hey, add a time.sleep(1) before each “see more”, worked fine for me

@ademjemaa thx for your suggestion! Probably a better way of accomplishing the same thing is to use the DOWNLOAD_DELAY parameter in settings.py. According to scrapy docs the delay time is randomized:

Scrapy doesn’t wait a fixed amount of time between requests, but uses a random interval between 0.5 * DOWNLOAD_DELAY and 1.5 * DOWNLOAD_DELAY.