facebook-scraper: Unable to collect posts beyond a certain number due to Temporary Block
Hi,
First of all, this is a really great tool, so thank you very much for your work!
I want to scrape some private groups. However, every time I’m trying, I get the message You are Temporarily Blocked after scraping from 100 posts up to 9000 posts, even thought the group I’m trying to scrape has way more posts. I have tested alt accounts too. Is there any possible solution to my problem so that Facebook don’t block me every time so quick? Or if there is a way I can continue from where I left off because I was blocked?
Furthermore, I’m using "allow_extra_requests": True since I want to download all photos to max quality. Could you add get_photos for groups to speed up scraping or is there any other way I could get the link of the first photo (at max quality) of every post faster without using allow_extra_requests which is slow?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 18 (2 by maintainers)
Increasing posts_per_page might help, as then you’d make fewer requests. Adding some
time.sleeplines might help reduce the rate at which you’re making requests. Yes, you can continue from a pagination url, by passing the url as thestart_urlargument toget_posts. These pagination URLs can be seen in the logs if you have debug logging enabled, or you can pass a callback function asrequest_url_callbacktoget_poststo handle extracting these pagination urls. Here’s some sample code:Note: https://github.com/kevinzg/facebook-scraper/commit/f3c8948ae04414932899686c89e696306f37ce1f simplifies this code a bit by making it possible to pass a
start_urlofNone.AFAIK, facebook only provides the high resolution image URL if you click on the photo, which involves an extra request for each photo.
Probably nothing to worry about, so long as that image URL extraction worked
@svlieri I think you misunderstand the point of start_url - it’s intended to take a pagination url (page of multiple posts). https://www.facebook.com/zuck/posts/10113982963226401 points to an individual post, not a page. It’s not a valid pagination url. It also needs to be in the m.facebook.com domain, not the facebook.com domain.
It seems pagination URLs can be accessed with different cookies, or without cookies even