instagram-php-scraper: Can't retrieve user medias
Using getMediasByUserId
returns error, the returned body is:
{"message": "forbidden", "status": "fail"}
Is there way to get around this?
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 9
- Comments: 109 (27 by maintainers)
Looks like this cat and mouse game can’t be finished if we continue to discuss these updates and fixes in public…
Above have already been told how X_Instagram-GIS calculated, but the game in “cat and mouse” continues and today encryption looks like this in PHP:
$gis = md5(join(':', array( $page_data['rhx_gis'], json_encode($variables) )));
Looks like the hash of
x-instagram-gis
no longer includes the user-agent string.#328 Need to test… But this solution works for me 😌
guys, i am feeling paranoid. So i was trying to find out rate limit of the graphql endpoint without having logged in user using postman and browser. Eventually, i got to the limit and my ip was blocked, the strange part is that my phone was also blocked, whenever I am trying to scroll down in any instagram profile using mobile chrome incognito mode I am getting 429… and the facts are
Can it be that they have MAC address of all my devices? Or can someone explain whats happening…
I have destroyed my phone and laptop, and moving to the mountains ))
@kenjones91 omg, they’re closing more up. This will kill my site. 🙁
@andrewyoo confirmed - it works, the only you need to keep in mind - request and parse tokens (rhx_gis and csrftoken) with same user agent as for other requests. Another problem - rate limits. Looks like its per IP based limits.
Below simple code to perform HTML parsing and get media data from JS (Instagram.php)
@knissophiliac you can get
rhx_gis
just from a GET on ‘https://www.instagram.com’. Also it’s pretty much returned in the html of most pages.@350d It’s still working for me…
@andrewyoo
x-instagram-gis
calculated withcsrf_token
,rhx_gis
,window.navigator.userAgent
and variables from API call. Here is my refactored hashing function:Call this function like this:
gishash("{\"id\":\"5821462185\",\"first\":40,\"after\":\"\"}", rhx_gis, csrf_token)
.rhx_gis
andcsrf_token
can be parsed from anyembed
page source (CORS
available on this links);I’ve tried to archive this via javascript but here is the problem: I can’t set these custom headers due
allow-origin
limitation for custom headers on instagram side, but this is not a problem in php I guess.@raiym @rhcarlosweb @gthedev hi! I really don’t know PHP to help with this one, but maybe the quick hotfix would be:
ACCOUNT_MEDIAS = 'https://instagram.com/graphql/query/?query_id=17888483320059182&id={user_id}&first={count}&after={max_id}';
to:ACCOUNT_MEDIAS = https://www.instagram.com/graphql/query/?query_hash=42323d64886122307be10013ad2dcc44&variables={"id":"{user_id}","first":{count},"after":"{max_id}"}
Maybe this is not final solution, but at least media queries will work (for some time 😅)
I made a solution for this one, but in python using automated browser to retrieve cookies and new URL. Really don’t know how PHP implementation would look like, but this are the steps to do:
'https://www.instagram.com/graphql/query/?query_hash=42323d64886122307be10013ad2dcc44&variables={"id":"<user_id>","first":<items_to_retrieve>,"after":"<end_cursor>"}'
where <end_cursor> is either blank or end_cursor from previous request <items_to_retrieve> - instagram web uses 12. I tested successfully with 20.Disclaimer 1: no authorization needed! Disclaimer 2: actually I reused the same cookies several times and it worked. The expiry seams to be set in one year. But I don’t know if Instagram will catch the usage of cookies from many different clients if hardcoded to this scapper!
Python implementation:
@Scottzonn somethink like this and still works 😃 …
But in this case scrapper is not neccesary to use… Hope real solution will be found.
I just solve the problem with parsing html page of account and then taking json from javascript. Yes it is just 12 medias, but it works 😃 I “love” instagram more and more )))
Does anyone know more details on the limits? What is the limit, is it based on user/ip/both?
The maximum request is 200 per hour! Check detail here: https://stackoverflow.com/questions/49585077/instagram-api-limit-reduced-to-200-from-5000
This doesn’t work today.
Got 403 status. 😦
@footniko I used the following headers which are working for anonymous crawling:
I’ve just realized that
x-instagram-gis
is just anmd5
hash 😀M… maybe Instagram is just waiting this cookie name, no matter the value. Because setting it to some random value, e.g.
42
works fine too!But yes, when
ig_pr
not present, returns 403 code.Nice user private data protection system, anyway 😅