GetOldTweets3: HTTP Error, Gives 404 but the URL is working

Hi, I had a script running over the past weeks and earlier today it stopped working. I keep receiving HTTPError 404, but the provided link in the errors still brings me to a valid page. Code is (all mentioned variables are established and the error specifically happens with the Manager when I check via debugging): tweetCriteria = got.manager.TweetCriteria().setQuerySearch(term)\ .setMaxTweets(max_count)\ .setSince(begin_timeframe)\ .setUntil(end_timeframe) scraped_tweets = got.manager.TweetManager.getTweets(tweetCriteria)

The error message for this is the standard 404 error “An error occured during an HTTP request: HTTP Error 404: Not Found Try to open in browser:” followed by the valid link

As I have changed nothing about the folder, I am wondering if something has happened with my configurations more so than anything else, but wondering if others are experiencing this.

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 74
Comments: 144

Links to this issue

python - Twitter scraping of older tweets - Stack Overflow

Commits related to this issue

Disabled the Twitter side because of Mottl/GetOldTweets3#98. — committed to ituethoslab/navcom-data-downloader by xmacex 4 years ago

Most upvoted comments

Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html

Unfortunately twitter api does not fully meet our need, because we need to full history search without any limitations. You can search only 5000 tweet in a month with twitter api.
I hope getoldtweets start to work as soon as possible, otherwise i can not complete my master thesis

+14

herdemo on Sep 19, 2020

Any alternative solution for it? My masters thesis is on hold because of it. I tried snscrape as mentioned in above comment but it does not return result based on a search query string

I used the below query search and it returns me the links of the tweets.

snscrape twitter-search "#XRP since:2019-12-31 until:2020-09-25" > XRP_Sept_tweets.txt

I obtain the tweet_id and then I used tweepy to extract the tweet as I needed more attributes (may not be the best way to do):

def get_tweets(tweet_ids, currency):
    #     global api
    statuses = api.statuses_lookup(tweet_ids, tweet_mode="extended")
    data = get_df() # define your own dataframe
    # printing the statuses
    for status in statuses:
        # print(status.lang)
        
        if status.lang == "en":
            mined = {
                "tweet_id": status.id,
                "name": status.user.name,
                "screen_name": status.user.screen_name,
                "retweet_count": status.retweet_count,
                "text": status.full_text,
                "mined_at": datetime.datetime.now(),
                "created_at": status.created_at,
                "favourite_count": status.favorite_count,
                "hashtags": status.entities["hashtags"],
                "status_count": status.user.statuses_count,
                "followers_count": status.user.followers_count,
                "location": status.place,
                "source_device": status.source,
                "coin_symbol": currency
            }

            last_tweet_id = status.id
            data = data.append(mined, ignore_index=True)

    print(currency, "outputing to tweets", len(data))
    data.to_csv(
        f"Extracted_TWEETS.csv", mode="a", header=not os.path.exists("Extracted_TWEETS.csv"), index=False
    )
    print("..... going to sleep 20s")
    time.sleep(20)

Note that tweet_ids is a list of 100 tweet ids.

+10

HuifangYeo on Sep 25, 2020

@TamiresMonteiroCD @WelXingz @ahsanspark @Atoxal @SophieChowZZY

I think I solved the problem. I made a few changes to the lines. I collect tweets using a word and location filter. I’m using Python 3.8.6 on Windows 10 and it works fine right now.

import snscrape.modules.twitter as sntwitter
import csv
maxTweets = 3000

#keyword = 'deprem'
#place = '5e02a0f0d91c76d2' #This geo_place string corresponds to İstanbul, Turkey on twitter.

#keyword = 'covid'
#place = '01fbe706f872cb32' This geo_place string corresponds to Washington DC on twitter.

#Open/create a file to append data to
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('deprem + place:5e02a0f0d91c76d2 + since:2020-10-31 until:2020-11-03 -filter:links -filter:replies').get_items()):
        if i > maxTweets :
            break  
        csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()

@burakoglakci thanks for sharing your experience and work with us!! Its really appreciable and its help me a lot. I want to ask that what will the query string (using snscraper) if we want to get the tweets according to longitude and latitude also how we can find the geo-location of any city/country on twitter. Thanks in advance 😃

sufyanhamid on Nov 13, 2020

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel

Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

ahsanspark on Oct 25, 2020

People here that have been using snscrape, can you post any code examples just doing a simple query search in script and not console? The lack of documentation is making this more trial and error as I learn the modules.

sagefuentes on Oct 6, 2020

So far the only method of scraping tweets that still seems to work is snscrape’s jsonl method. A comment in this Twint issue explains how to do this. Please note you will need python 3.8 and the latest development version of snscrape. This doesn’t export the .json result to .csv though. For that I used an online solution at first, later I used the pandas library in python for the conversion.

nalam002 on Sep 23, 2020

I tried replacing https://twitter.com/i/search/timeline with https://twitter.com/search?. 404 error is gone but now there is 400 bad request error.

baraths92 on Sep 18, 2020

So anyway to get historical tweets for a hashtag ? like the most popular hashtag for the word ripple as an example from 2015 ?

Tweepy have a limit of one week depth, i tried GOT but i have the same issue as here (404) anyone has another solution to build a database from historical tweets ? 😃

thanks !

Yes, refer to my article as I mentioned above where I cover the basics of using snscrape instead because GetOldTweets3 is basically obsolete due to changes in Twitter’s API https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

In regards to your specific use case, with snscrape you just put whatever query you want inside the quotes inside the TwitterSearchScraper method and adjust the since and until operators to whatever time range you’d want. I created a code snippet for you below. You can take out to i>500 if you don’t want to restrict the amount of tweets you want but just want every single tweet.

import snscrape.modules.twitter as sntwitter
import pandas

tweets_list2 = []

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('#ripple since:2015-01-01 until:2016-01-01').get_items()):
    if i>500:
        break
    tweets_list2.append([tweet.date, tweet.id, tweet.content, tweet.user.username])
   
tweets_df2 = pd.DataFrame(tweets_list2, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])

MartinBeckUT on Dec 7, 2020

@DV777 Hi!

https://medium.com/@jcldinco/downloading-historical-tweets-using-tweet-ids-via-snscrape-and-tweepy-5f4ecbf19032 you can get any tweet objects you want using the method described here. I created a script for my own work, and I share it below. I hope it’s useful 😃 You must have a Twitter developer account to use this method.

import pandas as pd
import tweepy
import csv

consumer_key = "aaaaaaaaaaaaaaaaaaaaa" 
consumer_secret = "aaaaaaaaaaaaaaaaaaaaaaaa" 
access_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 
access_token_secret = "aaaaaaaaaaaaaaaaaaaaaaaaaa"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

tweet_url = pd.read_csv("Your_Text_File.txt", index_col= None,
header = None, names = ["links"])

af = lambda x: x["links"].split("/")[-1]
tweet_url['id'] = tweet_url.apply(af, axis=1)
tweet_url.head()

ids = tweet_url['id'].tolist()
total_count = len(ids)
chunks = (total_count - 1) // 50 + 1

def fetch_tw(ids):
    list_of_tw_status = api.statuses_lookup(ids, tweet_mode= "extended")
    empty_data = pd.DataFrame()
    for status in list_of_tw_status:
            tweet_elem = {"date": status.created_at,
                     "tweet_id":status.id,
                     "tweet":status.full_text,
                     "User location":status.user.location,
                     "Retweet count":status.retweet_count,
                     "Like count":status.favorite_count,
                     "Source":status.source}
            empty_data = empty_data.append(tweet_elem, ignore_index = True)
    empty_data.to_csv("new_tweets.csv", mode="a")

for i in range(chunks):
        batch = ids[i*50:(i+1)*50]
        result = fetch_tw(batch)

burakoglakci on Nov 12, 2020

Hey! For the ones struggling to use snscrape, I put together a little library to download tweets using snscrape/tweepy according to customizable queries. Although it’s still a work in progress, check this repo if you want to give it a try 😃

stefanocortinovis on Dec 7, 2020

I don’t recommend using Tweepy with snscrape, it’s not really efficient, you’re basically scraping twice. When you scrape with snscrape there’s a tweet object you can interact with that has a lot of information that will cover most use cases. I wouldn’t recommend using tweepy’s api.statuses_lookup unless you need specific information only offered through tweepy.

For those still unsure about using snscrape I did write an article for scraping with snscrape that I hope clears up any confusion about using that library, there’s also python scripts and Jupyter notebooks I’ve created to build off of. I also have a picture in the article showing all the information accessible in snscrape’s tweet object. https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

MartinBeckUT on Dec 3, 2020

Thanks you so much @sufyanhamid I’m happy if it helped. As far as I know, the bounding box query cannot be run on snscrape, as in the Twitter Stream API. You can use the geocode query instead as in Twitter Rest API. Ex.

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('geocode:40.682299,-73.944852,5mi + since:2020-10-31 until:2020-11-03 -filter:links -filter:replies').get_items()):
        if i > maxTweets :
            break

With this query, you can collect tweets within 5 miles, surrounding the point coordinate you specify. As far as I know, you can write till 15 miles.

burakoglakci on Nov 14, 2020

Any alternative solution for it? My masters thesis is on hold because of it. I tried snscrape as mentioned in above comment but it does not return result based on a search query string

I used the below query search and it returns me the links of the tweets.

snscrape twitter-search "#XRP since:2019-12-31 until:2020-09-25" > XRP_Sept_tweets.txt

I obtain the tweet_id and then I used tweepy to extract the tweet as I needed more attributes (may not be the best way to do):

def get_tweets(tweet_ids, currency):
    #     global api
    statuses = api.statuses_lookup(tweet_ids, tweet_mode="extended")
    data = get_df() # define your own dataframe
    # printing the statuses
    for status in statuses:
        # print(status.lang)
        
        if status.lang == "en":
            mined = {
                "tweet_id": status.id,
                "name": status.user.name,
                "screen_name": status.user.screen_name,
                "retweet_count": status.retweet_count,
                "text": status.full_text,
                "mined_at": datetime.datetime.now(),
                "created_at": status.created_at,
                "favourite_count": status.favorite_count,
                "hashtags": status.entities["hashtags"],
                "status_count": status.user.statuses_count,
                "followers_count": status.user.followers_count,
                "location": status.place,
                "source_device": status.source,
                "coin_symbol": currency
            }

            last_tweet_id = status.id
            data = data.append(mined, ignore_index=True)

    print(currency, "outputing to tweets", len(data))
    data.to_csv(
        f"Extracted_TWEETS.csv", mode="a", header=not os.path.exists("Extracted_TWEETS.csv"), index=False
    )
    print("..... going to sleep 20s")
    time.sleep(20)

Note that tweet_ids is a list of 100 tweet ids.

This really works. Many thanks. Just keep in mind that using snscrape may return too many results, thus it is better to limit the number of tweet IDs using --max-results

lenhhoxung86 on Sep 27, 2020

same issue here, I think this is because twitter has removed the endpoint https://twitter.com/i/search/timeline?

007sumitsingh on Sep 21, 2020

Unfortunately i have same problem, i hope we find a solution as soon as possible.

herdemo on Sep 18, 2020

Here is debug enabled. It shows the actual url being called, and it seems that twitter has removed the /i/search/timeline endpoint. 😦

https://twitter.com/i/search/timeline?vertical=news&q=from%3AREDACTED&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Host: twitter.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
Referer: https://twitter.com/i/search/timeline?vertical=news&q=from%3AREDACTED&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Connection: keep-alive
An error occured during an HTTP request: HTTP Error 404: Not Found
Try to open in browser: https://twitter.com/search?q=%20from%3AREDACTED&src=typd

Daviey on Sep 18, 2020

@DV777 Yes, the parameters attached to tweepy apply to tweets that have already been scraped.

On snscrape if you remove the filter:replies parameter, you can get answers. You can also collect retweets by removing the filter:links parameter. But mostly collects the links of the main tweet. I don’t know if there’s a way to get the number of likes with snscrape.

burakoglakci on Nov 21, 2020

@Niehaus A query like this works, I hope it works.

import snscrape.modules.twitter as sntwitter import csv maxTweets = 3000

csvFile = open(‘place_result.csv’, ‘a’, newline=‘’, encoding=‘utf8’)

csvWriter = csv.writer(csvFile) csvWriter.writerow([‘id’,‘date’,‘tweet’,])

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(‘from:@burakoglakci + since:2015-12-02 until:2020-11-05-filter:links -filter:replies’).get_items()): if i > maxTweets : break
csvWriter.writerow([tweet.id, tweet.date, tweet.content]) csvFile.close()

burakoglakci on Nov 6, 2020

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.
import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()
I’ve tried to run this code with python 3.8.6 on windows 10 and it didn’t give me any result, it makes no erros but i end up with a empty csv (only with the headers), is there something that i might be missing?
Not sure why, but I had the same problem. I replace tweet.renderedContent by tweet.content and it works !
unfortunately that wasn’t my case, but i found the problem and it was about the date filter, i got all the results by removing them but now i can’t filter a specific time which is bad.

I’m having the exact same problem. When I remove the date filter it works, but when I have it (exactly how it is in the quoted code), I get no results. Anyone else having this issue or know how to solve it? @burakoglakci it’s not clear to me how the changes you made in the code would solve this problem.

**Edit: I think I figured it out. It’s simply that there was a small error in the quoted code, you have to put a space before the ‘since’

bensilver95 on Nov 5, 2020

Edited

With snscrape, this works:

snscrape --jsonl twitter-search "from:barackobama since:2015-09-10 until:2015-09-12”> baracktweets.json or snscrape twitter-search "from:barackobama since:2015-09-10 until:2015-09-12” > baracktweets.txt

Explanation from the developer: twitter-user is actually just a wrapper around twitter-search using the search term from:username (plus code to extract user information from the profile page)

Hello everyone, Is it possible to use snscrape or some other way to get the tweets for a specified twitter handle within the mentioned date range?

I basically want to find an alternate working way for this below GetoldTweets3 command

GetOldTweets3 --username “barackobama” --since 2015-09-10 --until 2015-09-12

ppival on Sep 29, 2020

snscrape twitter-search “#XRP since:2019-12-31 until:2020-09-25” > XRP_Sept_tweets.txt

Hello… I am facing issues with snscrape. I do not have command line environments and I am not able to run the program. Can you please explain step by step on how to run with jupyter notebook? And, getting the tweet ids are enough because I have tweepy to extract the tweets from tweet id. I am also getting the error module ‘functools’ has no attribute ‘cached_property’

I have (miniconda)[https://docs.conda.io/en/latest/miniconda.html] on Python 3.8. It doesn’t work on Python of lower version it seems. Then just install snscrape as follows: pip3 install snscrape

from the miniconda terminal, you should be able to use snscrape directly:

Thank you very much! It worked!! Thank you once again and I feel grateful for your help! 😃

baraths92 on Sep 28, 2020

Any alternative solution for it? My masters thesis is on hold because of it. I tried snscrape as mentioned in above comment but it does not return result based on a search query string

You can get the results by running a code like this: snscrape --jsonl twitter-search "YOURSEARCHQUERY @USERTODLFROM #HASHTAGTODLFROM since:2020-09-01 until:2020-09-25"> mytweets.json

I then ran the .json file through this tiny python code to get my .csv , which is enough for me right now. You might wanna check out the other answers if you’re looking for something more elegant with more info.

import pandas as pd
from io import StringIO
with open('mytweets.json', 'r', encoding ='utf-8-sig') as f:
    data = f.readlines()
data = map(lambda x: x.rstrip(), data)
data_json_str = "[" + ','.join(data) + "]"
newdf = pd.read_json(StringIO(data_json_str))
newdf.to_csv("mytweets.csv", encoding ='utf-8-sig')`

nalam002 on Sep 26, 2020

@HuifangYeo, if you really need to get data from twitter try the twitter api, I am using it like this:

import tweepy
import pandas as pd
import datetime
from datetime import timedelta
from ratelimit import limits, sleep_and_retry

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)


esperar = 900 #EN SEGUNDOS (15minutos)
llamadas=15 #al api, segun documentacion de twitter son 300 busquedas dentro de 15minutos
@sleep_and_retry
@limits(calls=llamadas, period=esperar)
def buscar_tweets(pais,query,idioma,date_since=None, maxItems = None):
    tweetContenido=[]
    tweetUsuario = []
    tweetUbicacion = []
    tweetPlaceName = []
    tweetCD = [] 
    tweetHashtag = []
    places = api.geo_search(query=pais, granularity="country")
    #Obtiene el ID del pais
    place_id = places[0].id
    if date_since == None:
        if maxItems == None:
            try:
                #Busca con la fecha de hoy
                tweets = tweepy.Cursor(api.search,q=query and ("place:%s" % place_id),
                               lang=idioma,since=datetime.date.today(), 
                                   extended = True,tweet_mode='extended').items()
            except Exception as e:
                print("Hubo un error. Detalles: " + str(e))
        else:
            try:
                #Busca con la fecha de hoy
                tweets = tweepy.Cursor(api.search,q=query and ("place:%s" % place_id),
                               lang=idioma,since=datetime.date.today(), 
                                   extended = True,tweet_mode='extended').items(maxItems)
            except Exception as e:
                print("Hubo un error. Detalles: " + str(e))
    else:
        if maxItems == None:
            try:
                tweets = tweepy.Cursor(api.search,q=query and ("place:%s" % place_id),
                               lang=idioma,since=date_since, 
                                   extended = True,tweet_mode='extended').items()
            except Exception as e:
                print("Hubo un error. Detalles: " + str(e))
        else:
            try:
                tweets = tweepy.Cursor(api.search,q=query and ("place:%s" % place_id),
                               lang=idioma,since=date_since, 
                                   extended = True,tweet_mode='extended').items(maxItems)
            except Exception as e:
                print("Hubo un error. Detalles: " + str(e)) 
    for tweet in tweets:
        tweetContenido.append(tweet.full_text)
        tweetUsuario.append(tweet.user.name)
        tweetUbicacion.append(tweet.user.location)
        tweetPlaceName.append(tweet.place.name)
        tweetCD.append(tweet.created_at)
        tweetHashtag.append(query)
    return tweetContenido,tweetUsuario, tweetUbicacion, tweetPlaceName,tweetCD, tweetHashtag

I did it like that but have to limit the amount of tweets otherwise you will get error 429. I also tried twint but it is not working currently, right now I think the best approch is to use this. I am using the limit rate, to wait 15minutes every 15 calls to the twitter api, this works well, but if you try to pull a lot of data, twitter will give you another error 429. I hope this can help you, and good luck. 😃

alexisqc92 on Sep 26, 2020

Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html

GabrielEspeschit on Sep 19, 2020

So anyway to get historical tweets for a hashtag ? like the most popular hashtag for the word ripple as an example from 2015 ?

Tweepy have a limit of one week depth, i tried GOT but i have the same issue as here (404) anyone has another solution to build a database from historical tweets ? 😃

thanks !

axaygaid on Dec 5, 2020

@DV777 Hi!

import pandas as pd
import tweepy
import csv

consumer_key = "aaaaaaaaaaaaaaaaaaaaa" 
consumer_secret = "aaaaaaaaaaaaaaaaaaaaaaaa" 
access_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 
access_token_secret = "aaaaaaaaaaaaaaaaaaaaaaaaaa"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

tweet_url = pd.read_csv("Your_Text_File.txt", index_col= None,
header = None, names = ["links"])

af = lambda x: x["links"].split("/")[-1]
tweet_url['id'] = tweet_url.apply(af, axis=1)
tweet_url.head()

ids = tweet_url['id'].tolist()
total_count = len(ids)
chunks = (total_count - 1) // 50 + 1

def fetch_tw(ids):
    list_of_tw_status = api.statuses_lookup(ids, tweet_mode= "extended")
    empty_data = pd.DataFrame()
    for status in list_of_tw_status:
            tweet_elem = {"date": status.created_at,
                     "tweet_id":status.id,
                     "tweet":status.full_text,
                     "User location":status.user.location,
                     "Retweet count":status.retweet_count,
                     "Like count":status.favorite_count,
                     "Source":status.source}
            empty_data = empty_data.append(tweet_elem, ignore_index = True)
    empty_data.to_csv("new_tweets.csv", mode="a")

for i in range(chunks):
        batch = ids[i*50:(i+1)*50]
        result = fetch_tw(batch)

Thanks for your help @burakoglakci , I’d be lost without this. Thing is when collecting a timeline, I do not get the retweets, replies and likes of the account I am scraping, and I guess these parameters apply to the tweets which are scraped already. I tried to find a way to scrape the full activity of an account but it seems quite hard. For example, even by using the following code :

import snscrape.modules.twitter as sntwitter
import csv
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@username + since:2009-01-01 until:2020-11-05 -filter:links -filter:replies').get_items()):
    csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()

I do not get the retweets / replies / likes made by the account. Only its own created tweets. Is there a way to scrape the whole thing ? Would you have a list of the additional parameters which I could add to the scraping ? Also, I do have these Twitter Api keys, problem being that tweepy & twitter api only let me collect 3000 tweets maximum when scraping an account’s timeline when I was using it in 2019. Is this still the case ?

DV777 on Nov 20, 2020

@burakoglakci: Thanks for sharing this
@csbhakat https://medium.com/@jcldinco/downloading-historical-tweets-using-tweet-ids-via-snscrape-and-tweepy-5f4ecbf19032 you can get any tweet objects you want using the method described here. I created a script for my own work, and I share it below. I hope it’s useful 😃 You must have a Twitter developer account to use this method.
import pandas as pd
import tweepy
import csv

consumer_key = "aaaaaaaaaaaaaaaaaaaaa" 
consumer_secret = "aaaaaaaaaaaaaaaaaaaaaaaa" 
access_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 
access_token_secret = "aaaaaaaaaaaaaaaaaaaaaaaaaa"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

tweet_url = pd.read_csv("Your_Text_File.txt", index_col= None,
header = None, names = ["links"])

af = lambda x: x["links"].split("/")[-1]
tweet_url['id'] = tweet_url.apply(af, axis=1)
tweet_url.head()

ids = tweet_url['id'].tolist()
total_count = len(ids)
chunks = (total_count - 1) // 50 + 1

def fetch_tw(ids):
    list_of_tw_status = api.statuses_lookup(ids, tweet_mode= "extended")
    empty_data = pd.DataFrame()
    for status in list_of_tw_status:
            tweet_elem = {"date": status.created_at,
                     "tweet_id":status.id,
                     "tweet":status.full_text,
                     "User location":status.user.location,
                     "Retweet count":status.retweet_count,
                     "Like count":status.favorite_count,
                     "Source":status.source}
            empty_data = empty_data.append(tweet_elem, ignore_index = True)
    empty_data.to_csv("new_tweets.csv", mode="a")

for i in range(chunks):
        batch = ids[i*50:(i+1)*50]
        result = fetch_tw(batch)
@burakoglakci: for this code , I need to get all the links and store into the “Your_Text_File.txt” file? and based on that link , this code will scrape the tweet , right? Suppose , I want to get all tweets from March, 2020 to Oct,2020 for #amazon , then how can I do that ? is your code help in that case ?

first, use snscrape to collect the tweets you want, including tweet id and links. you can collect your tweets in csv or txt file.

Then collect tweet objects using this code. The code I share here is based on tweepy. querying using tweet IDs and finding and collect the objects you want(like, retweet).

burakoglakci on Nov 19, 2020

Hello all ! I am a beginner with python & coding in general. Do you think GOT will be updated anytime soon in order to resume timelines’ scraping ? Also, how to get more information out of the tweets currently extractable thanks to @burakoglakci and the use of snscrape ? Is it possible to get the number of likes, replies, etc. to tweets for example ? I used the following code and it works fine thanks to all of you who offered an alternative to continue scraping Twitter 👍
import` snscrape.modules.twitter as sntwitter
import csv
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@username + since:2009-01-01 until:2020-11-05 -filter:links -filter:replies').get_items()):
    csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()
 

change from:@Username -> keywords:#hashtag to search by keyword as opposed to username

Thanks to all who made this code available! smooth program and helpful for current project!

jscas88 on Nov 19, 2020

@TamiresMonteiroCD @WelXingz @ahsanspark @Atoxal @SophieChowZZY **Sanırım sorunu **çözdüm . Hatlarda birkaç değişiklik yaptım. Bir kelime ve konum filtresi kullanarak tweet topluyorum. Windows 10’da Python 3.8.6 kullanıyorum ve şu anda iyi çalışıyor.
import snscrape.modules.twitter as sntwitter
import csv
maxTweets = 3000

#keyword = 'deprem'
#place = '5e02a0f0d91c76d2' #This geo_place string corresponds to İstanbul, Turkey on twitter.

#keyword = 'covid'
#place = '01fbe706f872cb32' This geo_place string corresponds to Washington DC on twitter.

#Open/create a file to append data to
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('deprem + place:5e02a0f0d91c76d2 + since:2020-10-31 until:2020-11-03 -filter:links -filter:replies').get_items()):
        if i > maxTweets :
            break  
        csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()
@burakoğlakcı deneyiminizi paylaştığınız ve bizimle çalıştığınız için teşekkürler !! Gerçekten takdire şayan ve bana çok yardımcı oluyor. Eğer tweetleri enlem ve boylamlara göre almak istersek, ayrıca herhangi bir şehrin / ülkenin coğrafi konumunu twitter üzerinden nasıl bulabileceğimizi sormak istiyorum (snscraper kullanarak) sorgu dizesi ne olacak? Şimdiden teşekkürler 😃
Selam! Öncelikle tüm desteğiniz için çok minnettarım, teşekkür ederim. Yer kimliğiyle ilgili bir sorunum var. Arizona ve Florida yer kimliğine ihtiyacım var ama bulamıyorum. Biri bana bunları (ve başka yer kimliğini) nasıl alabileceğimi söyleyebilir mi, lütfen? Şimdiden teşekkürler ❤️

Arizona USA id: a612c69b44b2e5da

Florida USA id: 4ec01c9dbc693497 to find these IDs, you have to run geocode query on twitter. Ex. geocode:34.684879,-111.699645,1mi this cooordinates allow you to search for a point location in Arizona. you can use any map service to access coordinates. then click on the content of a tweet that appears as a result of this query. you will see arizona, USA as the place name on this tweet content, if not, review another tweet. after clicking on the place name, you will see the place ID on the link in the search bar.

burakoglakci on Nov 17, 2020

Anyone have a tip for getting all the tweets in an individual’s timeline? Have managed to get user tweets (thank you @burakoglakci for your example) but would like to get the tweets the user retweets as well (tweet.retweetedTweet didn’t get it). And for any other noobish coders out there, just in case this helps.
import snscrape.modules.twitter as sntwitter

maxTweets = 10

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@TwitterSupport').get_items()) :
        if i > maxTweets :
            break  
        print(f"the date is {tweet.date}")  
        print(f"the user name is {tweet.user.username}")
        print(f"the tweet content is {tweet.content}")
        print(f"the tweet rendered content is {tweet.renderedContent}")
        print(f"the outlinks are {tweet.outlinks}")  
        print(f"the tco outlinks are {tweet.tcooutlinks}") 
        print(f"the url is {tweet.url}")
        print(f"the retweeted tweet is  {tweet.retweetedTweet}")        
        print(f"the quoted tweet is  {tweet.quotedTweet}") 

@Woolwit Thanks for share the more attributes of a tweets. Kindly also share the code/qurey of that how we can get the no.likes, no.retweets, no.comments. Thanks in advance.

sufyanhamid on Nov 16, 2020

Anyone have a tip for getting all the tweets in an individual’s timeline? Have managed to get user tweets (thank you @burakoglakci for your example) but would like to get the tweets the user retweets as well (tweet.retweetedTweet didn’t get it). And for any other noobish coders out there, just in case this helps.

import snscrape.modules.twitter as sntwitter

maxTweets = 10

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@TwitterSupport').get_items()) :
        if i > maxTweets :
            break  
        print(f"the date is {tweet.date}")  
        print(f"the user name is {tweet.user.username}")
        print(f"the tweet content is {tweet.content}")
        print(f"the tweet rendered content is {tweet.renderedContent}")
        print(f"the outlinks are {tweet.outlinks}")  
        print(f"the tco outlinks are {tweet.tcooutlinks}") 
        print(f"the url is {tweet.url}")
        print(f"the retweeted tweet is  {tweet.retweetedTweet}")        
        print(f"the quoted tweet is  {tweet.quotedTweet}")

Woolwit on Nov 15, 2020

@sunyoid I think your query column is changed use this colums. for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@Sunyoid4 + since:2015-12-02 until:2020-11-05-filter:links -filter:replies').get_items()): if i > maxTweets : break

I keep running into this problem everytime i run this code: import snscrape.modules.twitter as sntwitter import csv maxTweets = 20

csvFile = open('test5.csv', 'a', newline='', encoding='utf8')

csvWriter = csv.writer(csvFile) csvWriter.writerow(['id','date','tweet',])

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:SADWRIST-filter:links -filter:replies').get_items()): if i > maxTweets : break

csvWriter.writerow([tweet.id, tweet.date, tweet.content]) csvFile.close()

it saves the csv file of that user profile, but when i open the csv file which(only has 0kb). and the error code is: NameError Traceback (most recent call last) <ipython-input-2-23f7456de87e> in <module> 11 if i > maxTweets : break 12 ---> 13 csvWriter.writerow([tweet.id, tweet.date, tweet.content]) 14 csvFile.close() 15

NameError: name 'tweet' is not defined

In the final code i posted before it wasn’t working, it only scraped one tweet out of the profile. Anyone have a solution to this?

Yes it is a matter of indents, happened to me as well. When you have “if i> maxTweets:” - that needs to be in an indent. “Break” as well. “CsvWriter” needs to be aligned with the 'if i>maxTweets". the ‘csvFile.close()’ is outside of the if and needs to be aligned with the “for i, tweet in enumerate…”.

When it comes to the scraping of likes/retweets, I did not find any easy way to do it with snscrape. I have useed tweepy. Here is the link I have followed: https://medium.com/@jcldinco/downloading-historical-tweets-using-tweet-ids-via-snscrape-and-tweepy-5f4ecbf19032

Note, you need to request the Twitter Developer role because you need all the keys.

Hope it helps!

ldallacq on Nov 11, 2020

Merhaba! Son snscrape sorgusunu kullanıyorum, ancak benim için çalışmıyor. @JoeBiden’ı 2020-01-01’den kullanıyorum ve sadece 1 tweet ile garip bir çıktı alıyorum. Varsa bir mac kullanıcısıyım. Gerçekten neler olduğunu bilmiyorum. Kelimenin tam anlamıyla kodu kopyalayıp yapıştırıyorum ve tanıtıcıyı değiştiriyorum ama çalışmıyor. Herhangi bir ipucu? Çok teşekkür ederim!

Hi!

I using python 3.8.6 when I run this query

('from:@JoeBiden + since:2020-01-01 until:2020-11-10 -filter:links -filter:replies').get_items()) :

I’ve collected 901 tweets.

burakoglakci on Nov 10, 2020

Hello! I am using the last snscrape query, but it is not working for me. I am using @joebiden from 2020-01-01 and I am getting a weird output with just 1 tweet. I am a mac user, if any. I really do not know what is going on. I literally copy-paste the code and change the handle but it does not work. Any hints? Thank you so much!

ldallacq on Nov 10, 2020

@sbif

Hi guys! I’m totally lost: how can I use snscrape to extract tweet from a user in a specific time lapse? I’m a beginner with Python, I have to do this for my thesis: It’s three weeks I’m trying to extract this data without success, I tried with tweepy and than with GetOldTweets3 and I’ve just discovered about this new TwitterApi limit… Can somebody help me please?

Use this query with snscrape:

import snscrape.modules.twitter as sntwitter import csv maxTweets = 3000

csvFile = open(‘place_result.csv’, ‘a’, newline=‘’, encoding=‘utf8’)

csvWriter = csv.writer(csvFile) csvWriter.writerow([‘id’,‘date’,‘tweet’,])

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(‘from:@BillGates + since:2015-12-02 until:2020-11-05-filter:links -filter:replies’).get_items()): if i > maxTweets : break csvWriter.writerow([tweet.id, tweet.date, tweet.content]) csvFile.close()

burakoglakci on Nov 9, 2020

@burakoglakci You can please help me with the querie to get tweets of a specific user?

Niehaus on Nov 6, 2020

@bensilver95 @Niehaus

Absolutely, our queries are working. The codes I added in the previous post were not displayed correctly. If you want to add a location filter to your query,

keyword = ‘covid’

keyword + ’ place:095534ad3107e0e6 + since:2020-10-20 until:2020-11-04 -filter:links -filter:replies’).get_items()):

you can run this query, with this query, you can collect shared tweets about covid from the state of Kentucky. Querying on shorter date ranges, as with GOT, can yield better results. Because in queries where there are too many tweets, twitter can stop responding.

burakoglakci on Nov 6, 2020

@TamiresMonteiroCD @WelXingz @ahsanspark @Atoxal @SophieChowZZY

I think I solved the problem. I made a few changes to the lines. I collect tweets using a word and location filter. I’m using Python 3.8.6 on Windows 10 and it works fine right now.

import snscrape.modules.twitter as sntwitter
import csv
maxTweets = 3000

#keyword = 'deprem'
#place = '5e02a0f0d91c76d2' #This geo_place string corresponds to İstanbul, Turkey on twitter.

#keyword = 'covid'
#place = '01fbe706f872cb32' This geo_place string corresponds to Washington DC on twitter.

#Open/create a file to append data to
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('deprem + place:5e02a0f0d91c76d2 + since:2020-10-31 until:2020-11-03 -filter:links -filter:replies').get_items()):
        if i > maxTweets :
            break  
        csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()

burakoglakci on Nov 10, 2020

Edit: Esqueci de dizer isso. Às vezes, o aplicativo me dá um 400: Bad Request, eu o executo novamente e ele produz o HTML como disse antes.

This flashing seems to be related to the random choice of user agent in TweetManager.py where “user_agent = random.choice (TweetManager.user_agents …”. I believe that a loop scanning the user agent list with exception handling solves this problem.

TamiresMonteiroCD on Nov 4, 2020

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.
import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()
I’ve tried to run this code with python 3.8.6 on windows 10 and it didn’t give me any result, it makes no erros but i end up with a empty csv (only with the headers), is there something that i might be missing?

Not sure why, but I had the same problem. I replace tweet.renderedContent by tweet.content and it works !

AugusteDebroise on Nov 3, 2020

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.
import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()
May I ask what if I want to filter the language of the tweet (e.g. only tweet in English)? How can I add the filter for that?

add “lang:en” without quotes inside query string example: for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'lang:en').get_items()) :

ahsanspark on Oct 27, 2020

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.
import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()
this is when you are trying to filter by providing two dates, but how do you get all tweets? just by removing the filter criteria?

Yes, you can add or remove filters as per your need.

ahsanspark on Oct 26, 2020

People here that have been using snscrape, can you post any code examples just doing a simple query search in script and not console? The lack of documentation is making this more trial and error as I learn the modules.

import snscrape.modules.twitter as sntwitter
for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2015-12-17 until:2020-09-25').get_items()) :
        if i > maxTweets :
            break
        print(tweet.username)
        print(tweet.renderedContent)

prai0072010 on Oct 9, 2020

nscrape twitter-search "from:barackobama since:2015-09-10 until:2015-09-12” > baracktweets.txt

Can you please tell me how to get tweets with multiple keywords in search query like “Jobs AND (unemployment OR government)” @ppival

Sumbalq on Oct 4, 2020

Hi @ppival @shelu16 , thanks for the snscrape reference. I tried it and the twitter-search module works, but it only gives me the list of tweet url, e.g: https://twitter.com/irwanOyong/status/1309516653386842113

Tried the --jsonl and --with-entity but it failed. Any insight to get the item (tweet) details?

@irwanOyong I was having the same issue, the reason is I wasn’t using the development version of snscrape. Be sure to install it with pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git

Once I did that it worked like @ppival said it should.

bensilver95 on Sep 29, 2020

Any alternative solution for it? My masters thesis is on hold because of it.

What an excellent opportunity to write a chapter about politics of APIs in the context of research! 😅 Your supervisor will have references for literature I am sure (and depending on your field), but you can look at publications from the Digital Methods Initiative at the University of Amsterdam, including people like Anne Helmond.

xmacex on Sep 28, 2020

Found this in issues for Twint: https://github.com/twintproject/twint/pull/917#issuecomment-697361036

The trick is to encapsulate the call to the scraper in a loop, and then each time, decrement the c.until. I’m using something like this:

for x in range(0,number_of_skips): days = x * -7 end = start + timedelta(days) time.sleep(10) scrape(str(end))

The time.sleep (using the time module; in this case, ten seconds) helps avoid getting blocked on the Twitter end. The number_of_skips is a function created by the encapsulating program to determine the length of time I want to scrape in days , then dividing it by the number of days (in this case, a week).

“scrape” is just

def scrape(u_date): u_date += " 00:00:00" c = twint.Config() c.Search = st c.Store_object = True c.Limit = 40 c.Until = u_date c.Lang = “fi” twint.run.Search© tlist = c.search_tweet_list (and then print, store, whatever; and that’s it for the loop)

Worked for me

Atharva4899 on Sep 25, 2020

this was in issues for 'taspinar/twitterscraper ’ which also stopped working recently:

https://github.com/taspinar/twitterscraper/issues/344

eatyofood on Sep 24, 2020

Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html

Unfortunately twitter api does not fully meet our need, because we need to full history search without any limitations. You can search only 5000 tweet in a month with twitter api. I hope getoldtweets start to work as soon as possible, otherwise i can not complete my master thesis

I see! I’m fairly new to scrapping, but I’m working on a end of course thesis about sentiment analysis and could really use some newer tweets to help me out.

I’ve been tinkering with GOT3’s code a bit and got it to read the HTML of the search timeline, however it’s mostly unformatted. Like I said, I have little experience with scrapping so I’m really struggling to format it correctly. However, I will note my changes, for reference and for someone with more experience to pick-up if they so wish:

updated user_agents (updated with the ones used by TWINT);
updated endpoint (/search?)
some updates to the URL structure:

      url = "https://twitter.com/search?"

        

        url += ("q=%%20%s&src=typd%s"
                "&include_available_features=1&include_entities=1&max_position=%s"
                "&reset_error_state=false")

        if not tweetCriteria.topTweets:
            url += "&f=live"`

Edit: Forgot to say this. Sometimes the application gives me a 400: Bad Request, I run it again, and it outputs the HTML like said before.

GabrielEspeschit on Sep 20, 2020

I’m not sure if it is related to this issue, but some of the user_agents seem to be out of date

I forked and created a branch to allow a user-specified UA, using samples from my current browser doesn’t fix the problem.

I notice the search and referrer URL shown in--debug output (https://twitter.com/i/search/timeline) returns a 404 error:

$ GetOldTweets3 --username twitter --debug 
/home/inactivist/.local/bin/GetOldTweets3 --username twitter --debug
GetOldTweets3 0.0.11
Downloading tweets...
https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=from%3Atwitter&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Host: twitter.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
Referer: https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=from%3Atwitter&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Connection: keep-alive
An error occured during an HTTP request: HTTP Error 404: Not Found
Try to open in browser: https://twitter.com/search?q=%20from%3Atwitter&src=typd

$ curl -I https://twitter.com/i/search/timeline
HTTP/2 404 
[snip]

EDIT The url used for the internal search, and the one shown in the exception message, aren’t the same…

inactivist on Sep 18, 2020