facebook-scraper: Unable to collect posts beyond a certain number due to Temporary Block

Hi,

First of all, this is a really great tool, so thank you very much for your work! I want to scrape some private groups. However, every time I’m trying, I get the message You are Temporarily Blocked after scraping from 100 posts up to 9000 posts, even thought the group I’m trying to scrape has way more posts. I have tested alt accounts too. Is there any possible solution to my problem so that Facebook don’t block me every time so quick? Or if there is a way I can continue from where I left off because I was blocked? Furthermore, I’m using "allow_extra_requests": True since I want to download all photos to max quality. Could you add get_photos for groups to speed up scraping or is there any other way I could get the link of the first photo (at max quality) of every post faster without using allow_extra_requests which is slow?

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 1
Comments: 18 (2 by maintainers)

Most upvoted comments

Increasing posts_per_page might help, as then you’d make fewer requests. Adding some time.sleep lines might help reduce the rate at which you’re making requests. Yes, you can continue from a pagination url, by passing the url as the start_url argument to get_posts. These pagination URLs can be seen in the logs if you have debug logging enabled, or you can pass a callback function as request_url_callback to get_posts to handle extracting these pagination urls. Here’s some sample code:

import time
from facebook_scraper import *

results = []
start_url = None
def handle_pagination_url(url):
    global start_url
    start_url = url
set_cookies("cookies.txt")
while True:
    try:
        for post in get_posts("Nintendo", page_limit=None, start_url=start_url, request_url_callback=handle_pagination_url):
            print(len(results))
            results.append(post)
        print("All done")
        break
    except exceptions.TemporarilyBanned:
        print("Temporarily banned, sleeping for 10m")
        time.sleep(600)

Note: https://github.com/kevinzg/facebook-scraper/commit/f3c8948ae04414932899686c89e696306f37ce1f simplifies this code a bit by making it possible to pass a start_url of None.

AFAIK, facebook only provides the high resolution image URL if you click on the photo, which involves an extra request for each photo.

neon-ninja on Jun 13, 2021

Probably nothing to worry about, so long as that image URL extraction worked

neon-ninja on Jun 2, 2021

@svlieri I think you misunderstand the point of start_url - it’s intended to take a pagination url (page of multiple posts). https://www.facebook.com/zuck/posts/10113982963226401 points to an individual post, not a page. It’s not a valid pagination url. It also needs to be in the m.facebook.com domain, not the facebook.com domain.

neon-ninja on Oct 22, 2021

It seems pagination URLs can be accessed with different cookies, or without cookies even

neon-ninja on Jun 2, 2021