facebook-scraper: Unable to collect posts beyond a certain number due to Temporary Block

Hi,

First of all, this is a really great tool, so thank you very much for your work! I want to scrape some private groups. However, every time I’m trying, I get the message You are Temporarily Blocked after scraping from 100 posts up to 9000 posts, even thought the group I’m trying to scrape has way more posts. I have tested alt accounts too. Is there any possible solution to my problem so that Facebook don’t block me every time so quick? Or if there is a way I can continue from where I left off because I was blocked? Furthermore, I’m using "allow_extra_requests": True since I want to download all photos to max quality. Could you add get_photos for groups to speed up scraping or is there any other way I could get the link of the first photo (at max quality) of every post faster without using allow_extra_requests which is slow?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 18 (2 by maintainers)

Most upvoted comments

Increasing posts_per_page might help, as then you’d make fewer requests. Adding some time.sleep lines might help reduce the rate at which you’re making requests. Yes, you can continue from a pagination url, by passing the url as the start_url argument to get_posts. These pagination URLs can be seen in the logs if you have debug logging enabled, or you can pass a callback function as request_url_callback to get_posts to handle extracting these pagination urls. Here’s some sample code:

import time
from facebook_scraper import *

results = []
start_url = None
def handle_pagination_url(url):
    global start_url
    start_url = url
set_cookies("cookies.txt")
while True:
    try:
        for post in get_posts("Nintendo", page_limit=None, start_url=start_url, request_url_callback=handle_pagination_url):
            print(len(results))
            results.append(post)
        print("All done")
        break
    except exceptions.TemporarilyBanned:
        print("Temporarily banned, sleeping for 10m")
        time.sleep(600)

Note: https://github.com/kevinzg/facebook-scraper/commit/f3c8948ae04414932899686c89e696306f37ce1f simplifies this code a bit by making it possible to pass a start_url of None.

AFAIK, facebook only provides the high resolution image URL if you click on the photo, which involves an extra request for each photo.

Probably nothing to worry about, so long as that image URL extraction worked

@svlieri I think you misunderstand the point of start_url - it’s intended to take a pagination url (page of multiple posts). https://www.facebook.com/zuck/posts/10113982963226401 points to an individual post, not a page. It’s not a valid pagination url. It also needs to be in the m.facebook.com domain, not the facebook.com domain.

It seems pagination URLs can be accessed with different cookies, or without cookies even