instaloader: 429 redirect PR does not work
I was testing the PR https://github.com/instaloader/instaloader/pull/389 and found it doesn’t work.
This is the behaviour I’m experiencing:
I’m attempting to download about 900 logged-in profiles.
The download process from this app (from my understanding) consists of 2 parts:
1 - pre-processing which detects profile IDs with renames, etc. 2 - the actual download images/videos part.
The 429 issue starts occurring in the pre-processing part, this happens about half way through. Every profile detection after that point will error out with a 429, meaning they will never be marked as “valid” for the downloading part.
Possible fix: What I think needs to be done to fix this is by adding a 429 timeout to the profile pre-processing.
From the looks of it, it simply tries each profile 3 times after each other, errors out and moves on to the next.
Example log below:
Pre-processing
JSON Query x****/: 429 Too Many Requests [retrying; skip with ^C]
JSON Query x****/: 429 Too Many Requests [retrying; skip with ^C]
x****: JSON Query x****/: 429 Too Many Requests
JSON Query x****/: 429 Too Many Requests [retrying; skip with ^C]
JSON Query x****/: 429 Too Many Requests [retrying; skip with ^C]
x****: JSON Query x****/: 429 Too Many Requests
JSON Query x****/: 429 Too Many Requests [retrying; skip with ^C]
JSON Query x****/: 429 Too Many Requests [retrying; skip with ^C]
x****: JSON Query x****/: 429 Too Many Requests
JSON Query y****/: 429 Too Many Requests [retrying; skip with ^C]
JSON Query y****/: 429 Too Many Requests [retrying; skip with ^C]
y****: JSON Query y****/: 429 Too Many Requests
JSON Query y****/: 429 Too Many Requests [retrying; skip with ^C]
JSON Query y****/: 429 Too Many Requests [retrying; skip with ^C]
y****: JSON Query y****/: 429 Too Many Requests
JSON Query z****/: 429 Too Many Requests [retrying; skip with ^C]
JSON Query z****/: 429 Too Many Requests [retrying; skip with ^C]
z****: JSON Query z****/: 429 Too Many Requests
JSON Query z****/: 429 Too Many Requests [retrying; skip with ^C]
JSON Query z****/: 429 Too Many Requests [retrying; skip with ^C]
z****: JSON Query z****/: 429 Too Many Requests
JSON Query z****/: 429 Too Many Requests [retrying; skip with ^C]
JSON Query z****/: 429 Too Many Requests [retrying; skip with ^C]
z****: JSON Query z****/: 429 Too Many Requests
JSON Query z****/: 429 Too Many Requests [retrying; skip with ^C]
JSON Query z****/: 429 Too Many Requests [retrying; skip with ^C]
z****: JSON Query z****/: 429 Too Many Requests
JSON Query z****/: 429 Too Many Requests [retrying; skip with ^C]
JSON Query z****/: 429 Too Many Requests [retrying; skip with ^C]
z****: JSON Query z****/: 429 Too Many Requests
JSON Query z****/: 429 Too Many Requests [retrying; skip with ^C]
JSON Query z****/: 429 Too Many Requests [retrying; skip with ^C]
z****: JSON Query z****/: 429 Too Many Requests
Normal download process (for half of profiles)
Downloading 478 profiles: a**** a**** b*** ... k****
[downloading part works correctly here]
(The above should have 900 profiles)
Final Error log
Errors occured:
a****: JSON Query a****/: Could not find "window._sharedData" in html response.
c****: JSON Query c****/: Could not find "window._sharedData" in html response.
d****: JSON Query d****/: Could not find "window._sharedData" in html response.
e****: JSON Query e****/: Could not find "window._sharedData" in html response.
g****: JSON Query g****/: Could not find "window._sharedData" in html response.
g****: JSON Query g****/: Could not find "window._sharedData" in html response.
l****: JSON Query l****/: 429 Too Many Requests
l****: JSON Query l****/: 429 Too Many Requests
l****: JSON Query l****/: 429 Too Many Requests
l****: JSON Query l****/: 429 Too Many Requests
l****: JSON Query l****/: 429 Too Many Requests
l****: JSON Query l****/: 429 Too Many Requests
...
x****: JSON Query x****/: 429 Too Many Requests
x****: JSON Query x****/: 429 Too Many Requests
y****: JSON Query y****/: 429 Too Many Requests
y****: JSON Query y****/: 429 Too Many Requests
z****: JSON Query z****/: 429 Too Many Requests
z****: JSON Query z****/: 429 Too Many Requests
z****: JSON Query z****/: 429 Too Many Requests
z****: JSON Query z****/: 429 Too Many Requests
z****: JSON Query z****/: 429 Too Many Requests
z****: JSON Query z****/: 429 Too Many Requests
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 23 (1 by maintainers)
I had my private/public split anyway, as I’d run public ones in parallel and only use logged in API calls for ones that needed it.
It looks like the initial get profile ID call is now massively rate limited, so need to do it a different way.
From the looks of it you can get the id from the source html of ig.com/profilename which may or may not be rate limited. If it’s a standard web request and not through the api, hopefully it wont be rate limited.
Might need to change the process from:
to:
Today’s test:
Batch size: 20 Sleep between each: 15 mins
Duration before 429s: 7.5 hrs Batches processed: 24
From the looks of it, 24 batch x 20 profiles = 480 profiles. 2-3 api calls per profile (not sure how many are actually being called) means the api limit is around 1000-1500 per 24 hrs.
From my testing - had no account activity for a few days. Using fast update (most profiles only had a couple of downloads)
Did batch size of 20 with 5 mins sleep in between each batch.
After 4 hours at batch 24 started getting 429s.
Going to try increasing it to 10 min sleeps once the 429s time out. Will keep slowly adjusting until it works.
Edit:
Ban lasted about 18-23 hours.