facebook-scraper: Scrape does not get full post when there is 2 layers of
@neon-ninja When a post have long text or post_text with ‘double layer’ of ‘See more’ that need to be clicked, extractor only manage to get the first layer. What i had test: facebook-scraper==0.2.42 from git-master
- Using 2 different accounts (with 2 different cookies) in chrome and also firefox. I used EditThisCookie in chrome and Cookie Quick Manager in firefox
- Using both windows CLI and also from .py
- WIth --encoding utf-8 and without encoding.
For cli i used this code :
facebook-scraper --filename najibFullPost1.csv --pages 5 najibrazak -c C:\\Users\\insane\\Desktop\\NajibRazak\\cookies.json -v --encoding utf-8
the output for 1 layer of See more is fine. But if there is two layers it will only capture the first layer :
1 Layer output
2 layer output
I have read about others that been facing this issues but none seems to solve this problem.
by using
>>> from facebook_scraper import get_posts, enable_logging
>>> import logging
>>> import pprint
>>> enable_logging(logging.DEBUG)
>>> for post in get_posts(post_urls=[10157944979490952]):
... print(post['text'])
...
it will return correct post value, but not if in cli with username.
side note : i have a problem that the output file is printing empty space between each record (row). I fixed it by adding
newline=''
with open(filename, 'w', encoding=encoding, newline='') as output_file: dict_writer = csv.DictWriter(output_file, keys) dict_writer.writeheader() dict_writer.writerows(list_of_posts)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 23
Ok, I think I see the problem. For me, the HTML is
but for you, it’s
which
(?<=…\s)<a href="([^"]+)does not match, as data-gt is preceding the href. This regex can be simplified - try this - https://github.com/kevinzg/facebook-scraper/commit/e7b2a50cb39ecccd66d43e0a8ff66b65f9e75311Git master
Almost, I used
print(len(post["text"]))instead ofprint(post["text"])Actually the code block I posted doesn’t explicitly scrape 10157944979490952, it iterates through posts on najibrazak until it hits 10157944979490952 and then it stops. The reason I was asking for log messages, is that in order to see the full text of a post, the scraper needs to “click” on it. It doesn’t matter if there’s one layer or two, as soon as the scraper sees
…it should fire off a request to https://m.facebook.com/10157944979490952. Logs for that should look like this:Does your debug log output
Fetching 10157944979490952?This is working fine for me, the code:
outputs 2708. Do you get something different? Do you get any log messages that might indicate why?
I’ve committed your newline fix as https://github.com/kevinzg/facebook-scraper/commit/fb15eb5b745d09bbcfcbd45bf1425e8c349ab03c