GetOldTweets3: HTTP Error, Gives 404 but the URL is working

Hi, I had a script running over the past weeks and earlier today it stopped working. I keep receiving HTTPError 404, but the provided link in the errors still brings me to a valid page. Code is (all mentioned variables are established and the error specifically happens with the Manager when I check via debugging): tweetCriteria = got.manager.TweetCriteria().setQuerySearch(term)\ .setMaxTweets(max_count)\ .setSince(begin_timeframe)\ .setUntil(end_timeframe) scraped_tweets = got.manager.TweetManager.getTweets(tweetCriteria)

The error message for this is the standard 404 error “An error occured during an HTTP request: HTTP Error 404: Not Found Try to open in browser:” followed by the valid link

As I have changed nothing about the folder, I am wondering if something has happened with my configurations more so than anything else, but wondering if others are experiencing this.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 74
  • Comments: 144

Commits related to this issue

Most upvoted comments

Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html

Unfortunately twitter api does not fully meet our need, because we need to full history search without any limitations. You can search only 5000 tweet in a month with twitter api.
I hope getoldtweets start to work as soon as possible, otherwise i can not complete my master thesis

Any alternative solution for it? My masters thesis is on hold because of it. I tried snscrape as mentioned in above comment but it does not return result based on a search query string

I used the below query search and it returns me the links of the tweets.

snscrape twitter-search "#XRP since:2019-12-31 until:2020-09-25" > XRP_Sept_tweets.txt

I obtain the tweet_id and then I used tweepy to extract the tweet as I needed more attributes (may not be the best way to do):

def get_tweets(tweet_ids, currency):
    #     global api
    statuses = api.statuses_lookup(tweet_ids, tweet_mode="extended")
    data = get_df() # define your own dataframe
    # printing the statuses
    for status in statuses:
        # print(status.lang)
        
        if status.lang == "en":
            mined = {
                "tweet_id": status.id,
                "name": status.user.name,
                "screen_name": status.user.screen_name,
                "retweet_count": status.retweet_count,
                "text": status.full_text,
                "mined_at": datetime.datetime.now(),
                "created_at": status.created_at,
                "favourite_count": status.favorite_count,
                "hashtags": status.entities["hashtags"],
                "status_count": status.user.statuses_count,
                "followers_count": status.user.followers_count,
                "location": status.place,
                "source_device": status.source,
                "coin_symbol": currency
            }

            last_tweet_id = status.id
            data = data.append(mined, ignore_index=True)

    print(currency, "outputing to tweets", len(data))
    data.to_csv(
        f"Extracted_TWEETS.csv", mode="a", header=not os.path.exists("Extracted_TWEETS.csv"), index=False
    )
    print("..... going to sleep 20s")
    time.sleep(20)

Note that tweet_ids is a list of 100 tweet ids.

@TamiresMonteiroCD @WelXingz @ahsanspark @Atoxal @SophieChowZZY

I think I solved the problem. I made a few changes to the lines. I collect tweets using a word and location filter. I’m using Python 3.8.6 on Windows 10 and it works fine right now.

import snscrape.modules.twitter as sntwitter
import csv
maxTweets = 3000

#keyword = 'deprem'
#place = '5e02a0f0d91c76d2' #This geo_place string corresponds to İstanbul, Turkey on twitter.

#keyword = 'covid'
#place = '01fbe706f872cb32' This geo_place string corresponds to Washington DC on twitter.

#Open/create a file to append data to
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('deprem + place:5e02a0f0d91c76d2 + since:2020-10-31 until:2020-11-03 -filter:links -filter:replies').get_items()):
        if i > maxTweets :
            break  
        csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()

@burakoglakci thanks for sharing your experience and work with us!! Its really appreciable and its help me a lot. I want to ask that what will the query string (using snscraper) if we want to get the tweets according to longitude and latitude also how we can find the geo-location of any city/country on twitter. Thanks in advance 😃

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel

Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

People here that have been using snscrape, can you post any code examples just doing a simple query search in script and not console? The lack of documentation is making this more trial and error as I learn the modules.

So far the only method of scraping tweets that still seems to work is snscrape’s jsonl method. A comment in this Twint issue explains how to do this. Please note you will need python 3.8 and the latest development version of snscrape. This doesn’t export the .json result to .csv though. For that I used an online solution at first, later I used the pandas library in python for the conversion.

I tried replacing https://twitter.com/i/search/timeline with https://twitter.com/search?. 404 error is gone but now there is 400 bad request error.

So anyway to get historical tweets for a hashtag ? like the most popular hashtag for the word ripple as an example from 2015 ?

Tweepy have a limit of one week depth, i tried GOT but i have the same issue as here (404) anyone has another solution to build a database from historical tweets ? 😃

thanks !

Yes, refer to my article as I mentioned above where I cover the basics of using snscrape instead because GetOldTweets3 is basically obsolete due to changes in Twitter’s API https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

In regards to your specific use case, with snscrape you just put whatever query you want inside the quotes inside the TwitterSearchScraper method and adjust the since and until operators to whatever time range you’d want. I created a code snippet for you below. You can take out to i>500 if you don’t want to restrict the amount of tweets you want but just want every single tweet.

import snscrape.modules.twitter as sntwitter
import pandas

tweets_list2 = []

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('#ripple since:2015-01-01 until:2016-01-01').get_items()):
    if i>500:
        break
    tweets_list2.append([tweet.date, tweet.id, tweet.content, tweet.user.username])
   
tweets_df2 = pd.DataFrame(tweets_list2, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])

@DV777 Hi!

https://medium.com/@jcldinco/downloading-historical-tweets-using-tweet-ids-via-snscrape-and-tweepy-5f4ecbf19032 you can get any tweet objects you want using the method described here. I created a script for my own work, and I share it below. I hope it’s useful 😃 You must have a Twitter developer account to use this method.

import pandas as pd
import tweepy
import csv

consumer_key = "aaaaaaaaaaaaaaaaaaaaa" 
consumer_secret = "aaaaaaaaaaaaaaaaaaaaaaaa" 
access_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 
access_token_secret = "aaaaaaaaaaaaaaaaaaaaaaaaaa"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

tweet_url = pd.read_csv("Your_Text_File.txt", index_col= None,
header = None, names = ["links"])

af = lambda x: x["links"].split("/")[-1]
tweet_url['id'] = tweet_url.apply(af, axis=1)
tweet_url.head()

ids = tweet_url['id'].tolist()
total_count = len(ids)
chunks = (total_count - 1) // 50 + 1

def fetch_tw(ids):
    list_of_tw_status = api.statuses_lookup(ids, tweet_mode= "extended")
    empty_data = pd.DataFrame()
    for status in list_of_tw_status:
            tweet_elem = {"date": status.created_at,
                     "tweet_id":status.id,
                     "tweet":status.full_text,
                     "User location":status.user.location,
                     "Retweet count":status.retweet_count,
                     "Like count":status.favorite_count,
                     "Source":status.source}
            empty_data = empty_data.append(tweet_elem, ignore_index = True)
    empty_data.to_csv("new_tweets.csv", mode="a")

for i in range(chunks):
        batch = ids[i*50:(i+1)*50]
        result = fetch_tw(batch)

Hey! For the ones struggling to use snscrape, I put together a little library to download tweets using snscrape/tweepy according to customizable queries. Although it’s still a work in progress, check this repo if you want to give it a try 😃

I don’t recommend using Tweepy with snscrape, it’s not really efficient, you’re basically scraping twice. When you scrape with snscrape there’s a tweet object you can interact with that has a lot of information that will cover most use cases. I wouldn’t recommend using tweepy’s api.statuses_lookup unless you need specific information only offered through tweepy.

For those still unsure about using snscrape I did write an article for scraping with snscrape that I hope clears up any confusion about using that library, there’s also python scripts and Jupyter notebooks I’ve created to build off of. I also have a picture in the article showing all the information accessible in snscrape’s tweet object. https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

Thanks you so much @sufyanhamid I’m happy if it helped. As far as I know, the bounding box query cannot be run on snscrape, as in the Twitter Stream API. You can use the geocode query instead as in Twitter Rest API. Ex.

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('geocode:40.682299,-73.944852,5mi + since:2020-10-31 until:2020-11-03 -filter:links -filter:replies').get_items()):
        if i > maxTweets :
            break

With this query, you can collect tweets within 5 miles, surrounding the point coordinate you specify. As far as I know, you can write till 15 miles.

Any alternative solution for it? My masters thesis is on hold because of it. I tried snscrape as mentioned in above comment but it does not return result based on a search query string

I used the below query search and it returns me the links of the tweets.

snscrape twitter-search "#XRP since:2019-12-31 until:2020-09-25" > XRP_Sept_tweets.txt

I obtain the tweet_id and then I used tweepy to extract the tweet as I needed more attributes (may not be the best way to do):

def get_tweets(tweet_ids, currency):
    #     global api
    statuses = api.statuses_lookup(tweet_ids, tweet_mode="extended")
    data = get_df() # define your own dataframe
    # printing the statuses
    for status in statuses:
        # print(status.lang)
        
        if status.lang == "en":
            mined = {
                "tweet_id": status.id,
                "name": status.user.name,
                "screen_name": status.user.screen_name,
                "retweet_count": status.retweet_count,
                "text": status.full_text,
                "mined_at": datetime.datetime.now(),
                "created_at": status.created_at,
                "favourite_count": status.favorite_count,
                "hashtags": status.entities["hashtags"],
                "status_count": status.user.statuses_count,
                "followers_count": status.user.followers_count,
                "location": status.place,
                "source_device": status.source,
                "coin_symbol": currency
            }

            last_tweet_id = status.id
            data = data.append(mined, ignore_index=True)

    print(currency, "outputing to tweets", len(data))
    data.to_csv(
        f"Extracted_TWEETS.csv", mode="a", header=not os.path.exists("Extracted_TWEETS.csv"), index=False
    )
    print("..... going to sleep 20s")
    time.sleep(20)

Note that tweet_ids is a list of 100 tweet ids.

This really works. Many thanks. Just keep in mind that using snscrape may return too many results, thus it is better to limit the number of tweet IDs using --max-results

same issue here, I think this is because twitter has removed the endpoint https://twitter.com/i/search/timeline?

Unfortunately i have same problem, i hope we find a solution as soon as possible.

Here is debug enabled. It shows the actual url being called, and it seems that twitter has removed the /i/search/timeline endpoint. 😦

https://twitter.com/i/search/timeline?vertical=news&q=from%3AREDACTED&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Host: twitter.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
Referer: https://twitter.com/i/search/timeline?vertical=news&q=from%3AREDACTED&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Connection: keep-alive
An error occured during an HTTP request: HTTP Error 404: Not Found
Try to open in browser: https://twitter.com/search?q=%20from%3AREDACTED&src=typd

@DV777 Yes, the parameters attached to tweepy apply to tweets that have already been scraped.

On snscrape if you remove the filter:replies parameter, you can get answers. You can also collect retweets by removing the filter:links parameter. But mostly collects the links of the main tweet. I don’t know if there’s a way to get the number of likes with snscrape.

@Niehaus A query like this works, I hope it works.

import snscrape.modules.twitter as sntwitter import csv maxTweets = 3000

csvFile = open(‘place_result.csv’, ‘a’, newline=‘’, encoding=‘utf8’)

csvWriter = csv.writer(csvFile) csvWriter.writerow([‘id’,‘date’,‘tweet’,])

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(‘from:@burakoglakci + since:2015-12-02 until:2020-11-05-filter:links -filter:replies’).get_items()): if i > maxTweets : break
csvWriter.writerow([tweet.id, tweet.date, tweet.content]) csvFile.close()

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

I’ve tried to run this code with python 3.8.6 on windows 10 and it didn’t give me any result, it makes no erros but i end up with a empty csv (only with the headers), is there something that i might be missing?

Not sure why, but I had the same problem. I replace tweet.renderedContent by tweet.content and it works !

unfortunately that wasn’t my case, but i found the problem and it was about the date filter, i got all the results by removing them but now i can’t filter a specific time which is bad.

I’m having the exact same problem. When I remove the date filter it works, but when I have it (exactly how it is in the quoted code), I get no results. Anyone else having this issue or know how to solve it? @burakoglakci it’s not clear to me how the changes you made in the code would solve this problem.

**Edit: I think I figured it out. It’s simply that there was a small error in the quoted code, you have to put a space before the ‘since’

Edited

With snscrape, this works:

snscrape --jsonl twitter-search "from:barackobama since:2015-09-10 until:2015-09-12”> baracktweets.json or snscrape twitter-search "from:barackobama since:2015-09-10 until:2015-09-12” > baracktweets.txt

Explanation from the developer: twitter-user is actually just a wrapper around twitter-search using the search term from:username (plus code to extract user information from the profile page)

Hello everyone, Is it possible to use snscrape or some other way to get the tweets for a specified twitter handle within the mentioned date range?

I basically want to find an alternate working way for this below GetoldTweets3 command

GetOldTweets3 --username “barackobama” --since 2015-09-10 --until 2015-09-12

snscrape twitter-search “#XRP since:2019-12-31 until:2020-09-25” > XRP_Sept_tweets.txt

Hello… I am facing issues with snscrape. I do not have command line environments and I am not able to run the program. Can you please explain step by step on how to run with jupyter notebook? And, getting the tweet ids are enough because I have tweepy to extract the tweets from tweet id. I am also getting the error module ‘functools’ has no attribute ‘cached_property’

I have (miniconda)[https://docs.conda.io/en/latest/miniconda.html] on Python 3.8. It doesn’t work on Python of lower version it seems. Then just install snscrape as follows: pip3 install snscrape

from the miniconda terminal, you should be able to use snscrape directly:

image

Thank you very much! It worked!! Thank you once again and I feel grateful for your help! 😃

Any alternative solution for it? My masters thesis is on hold because of it. I tried snscrape as mentioned in above comment but it does not return result based on a search query string

You can get the results by running a code like this: snscrape --jsonl twitter-search "YOURSEARCHQUERY @USERTODLFROM #HASHTAGTODLFROM since:2020-09-01 until:2020-09-25"> mytweets.json

I then ran the .json file through this tiny python code to get my .csv , which is enough for me right now. You might wanna check out the other answers if you’re looking for something more elegant with more info.

import pandas as pd
from io import StringIO
with open('mytweets.json', 'r', encoding ='utf-8-sig') as f:
    data = f.readlines()
data = map(lambda x: x.rstrip(), data)
data_json_str = "[" + ','.join(data) + "]"
newdf = pd.read_json(StringIO(data_json_str))
newdf.to_csv("mytweets.csv", encoding ='utf-8-sig')`

@HuifangYeo, if you really need to get data from twitter try the twitter api, I am using it like this:

import tweepy
import pandas as pd
import datetime
from datetime import timedelta
from ratelimit import limits, sleep_and_retry

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)


esperar = 900 #EN SEGUNDOS (15minutos)
llamadas=15 #al api, segun documentacion de twitter son 300 busquedas dentro de 15minutos
@sleep_and_retry
@limits(calls=llamadas, period=esperar)
def buscar_tweets(pais,query,idioma,date_since=None, maxItems = None):
    tweetContenido=[]
    tweetUsuario = []
    tweetUbicacion = []
    tweetPlaceName = []
    tweetCD = [] 
    tweetHashtag = []
    places = api.geo_search(query=pais, granularity="country")
    #Obtiene el ID del pais
    place_id = places[0].id
    if date_since == None:
        if maxItems == None:
            try:
                #Busca con la fecha de hoy
                tweets = tweepy.Cursor(api.search,q=query and ("place:%s" % place_id),
                               lang=idioma,since=datetime.date.today(), 
                                   extended = True,tweet_mode='extended').items()
            except Exception as e:
                print("Hubo un error. Detalles: " + str(e))
        else:
            try:
                #Busca con la fecha de hoy
                tweets = tweepy.Cursor(api.search,q=query and ("place:%s" % place_id),
                               lang=idioma,since=datetime.date.today(), 
                                   extended = True,tweet_mode='extended').items(maxItems)
            except Exception as e:
                print("Hubo un error. Detalles: " + str(e))
    else:
        if maxItems == None:
            try:
                tweets = tweepy.Cursor(api.search,q=query and ("place:%s" % place_id),
                               lang=idioma,since=date_since, 
                                   extended = True,tweet_mode='extended').items()
            except Exception as e:
                print("Hubo un error. Detalles: " + str(e))
        else:
            try:
                tweets = tweepy.Cursor(api.search,q=query and ("place:%s" % place_id),
                               lang=idioma,since=date_since, 
                                   extended = True,tweet_mode='extended').items(maxItems)
            except Exception as e:
                print("Hubo un error. Detalles: " + str(e)) 
    for tweet in tweets:
        tweetContenido.append(tweet.full_text)
        tweetUsuario.append(tweet.user.name)
        tweetUbicacion.append(tweet.user.location)
        tweetPlaceName.append(tweet.place.name)
        tweetCD.append(tweet.created_at)
        tweetHashtag.append(query)
    return tweetContenido,tweetUsuario, tweetUbicacion, tweetPlaceName,tweetCD, tweetHashtag


I did it like that but have to limit the amount of tweets otherwise you will get error 429. I also tried twint but it is not working currently, right now I think the best approch is to use this. I am using the limit rate, to wait 15minutes every 15 calls to the twitter api, this works well, but if you try to pull a lot of data, twitter will give you another error 429. I hope this can help you, and good luck. 😃

So anyway to get historical tweets for a hashtag ? like the most popular hashtag for the word ripple as an example from 2015 ?

Tweepy have a limit of one week depth, i tried GOT but i have the same issue as here (404) anyone has another solution to build a database from historical tweets ? 😃

thanks !

@DV777 Hi!

https://medium.com/@jcldinco/downloading-historical-tweets-using-tweet-ids-via-snscrape-and-tweepy-5f4ecbf19032 you can get any tweet objects you want using the method described here. I created a script for my own work, and I share it below. I hope it’s useful 😃 You must have a Twitter developer account to use this method.

import pandas as pd
import tweepy
import csv

consumer_key = "aaaaaaaaaaaaaaaaaaaaa" 
consumer_secret = "aaaaaaaaaaaaaaaaaaaaaaaa" 
access_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 
access_token_secret = "aaaaaaaaaaaaaaaaaaaaaaaaaa"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

tweet_url = pd.read_csv("Your_Text_File.txt", index_col= None,
header = None, names = ["links"])

af = lambda x: x["links"].split("/")[-1]
tweet_url['id'] = tweet_url.apply(af, axis=1)
tweet_url.head()

ids = tweet_url['id'].tolist()
total_count = len(ids)
chunks = (total_count - 1) // 50 + 1

def fetch_tw(ids):
    list_of_tw_status = api.statuses_lookup(ids, tweet_mode= "extended")
    empty_data = pd.DataFrame()
    for status in list_of_tw_status:
            tweet_elem = {"date": status.created_at,
                     "tweet_id":status.id,
                     "tweet":status.full_text,
                     "User location":status.user.location,
                     "Retweet count":status.retweet_count,
                     "Like count":status.favorite_count,
                     "Source":status.source}
            empty_data = empty_data.append(tweet_elem, ignore_index = True)
    empty_data.to_csv("new_tweets.csv", mode="a")

for i in range(chunks):
        batch = ids[i*50:(i+1)*50]
        result = fetch_tw(batch)

Thanks for your help @burakoglakci , I’d be lost without this. Thing is when collecting a timeline, I do not get the retweets, replies and likes of the account I am scraping, and I guess these parameters apply to the tweets which are scraped already. I tried to find a way to scrape the full activity of an account but it seems quite hard. For example, even by using the following code :

import snscrape.modules.twitter as sntwitter
import csv
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@username + since:2009-01-01 until:2020-11-05 -filter:links -filter:replies').get_items()):
    csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()

I do not get the retweets / replies / likes made by the account. Only its own created tweets. Is there a way to scrape the whole thing ? Would you have a list of the additional parameters which I could add to the scraping ? Also, I do have these Twitter Api keys, problem being that tweepy & twitter api only let me collect 3000 tweets maximum when scraping an account’s timeline when I was using it in 2019. Is this still the case ?

@burakoglakci: Thanks for sharing this

@csbhakat https://medium.com/@jcldinco/downloading-historical-tweets-using-tweet-ids-via-snscrape-and-tweepy-5f4ecbf19032 you can get any tweet objects you want using the method described here. I created a script for my own work, and I share it below. I hope it’s useful 😃 You must have a Twitter developer account to use this method.

import pandas as pd
import tweepy
import csv

consumer_key = "aaaaaaaaaaaaaaaaaaaaa" 
consumer_secret = "aaaaaaaaaaaaaaaaaaaaaaaa" 
access_token = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 
access_token_secret = "aaaaaaaaaaaaaaaaaaaaaaaaaa"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

tweet_url = pd.read_csv("Your_Text_File.txt", index_col= None,
header = None, names = ["links"])

af = lambda x: x["links"].split("/")[-1]
tweet_url['id'] = tweet_url.apply(af, axis=1)
tweet_url.head()

ids = tweet_url['id'].tolist()
total_count = len(ids)
chunks = (total_count - 1) // 50 + 1

def fetch_tw(ids):
    list_of_tw_status = api.statuses_lookup(ids, tweet_mode= "extended")
    empty_data = pd.DataFrame()
    for status in list_of_tw_status:
            tweet_elem = {"date": status.created_at,
                     "tweet_id":status.id,
                     "tweet":status.full_text,
                     "User location":status.user.location,
                     "Retweet count":status.retweet_count,
                     "Like count":status.favorite_count,
                     "Source":status.source}
            empty_data = empty_data.append(tweet_elem, ignore_index = True)
    empty_data.to_csv("new_tweets.csv", mode="a")

for i in range(chunks):
        batch = ids[i*50:(i+1)*50]
        result = fetch_tw(batch)

@burakoglakci: for this code , I need to get all the links and store into the “Your_Text_File.txt” file? and based on that link , this code will scrape the tweet , right? Suppose , I want to get all tweets from March, 2020 to Oct,2020 for #amazon , then how can I do that ? is your code help in that case ?

first, use snscrape to collect the tweets you want, including tweet id and links. you can collect your tweets in csv or txt file.

Then collect tweet objects using this code. The code I share here is based on tweepy. querying using tweet IDs and finding and collect the objects you want(like, retweet).

Hello all ! I am a beginner with python & coding in general. Do you think GOT will be updated anytime soon in order to resume timelines’ scraping ? Also, how to get more information out of the tweets currently extractable thanks to @burakoglakci and the use of snscrape ? Is it possible to get the number of likes, replies, etc. to tweets for example ? I used the following code and it works fine thanks to all of you who offered an alternative to continue scraping Twitter 👍

import` snscrape.modules.twitter as sntwitter
import csv
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@username + since:2009-01-01 until:2020-11-05 -filter:links -filter:replies').get_items()):
    csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()
 

change from:@Username -> keywords:#hashtag to search by keyword as opposed to username

Thanks to all who made this code available! smooth program and helpful for current project!

@TamiresMonteiroCD @WelXingz @ahsanspark @Atoxal @SophieChowZZY **Sanırım sorunu **çözdüm . Hatlarda birkaç değişiklik yaptım. Bir kelime ve konum filtresi kullanarak tweet topluyorum. Windows 10’da Python 3.8.6 kullanıyorum ve şu anda iyi çalışıyor.

import snscrape.modules.twitter as sntwitter
import csv
maxTweets = 3000

#keyword = 'deprem'
#place = '5e02a0f0d91c76d2' #This geo_place string corresponds to İstanbul, Turkey on twitter.

#keyword = 'covid'
#place = '01fbe706f872cb32' This geo_place string corresponds to Washington DC on twitter.

#Open/create a file to append data to
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('deprem + place:5e02a0f0d91c76d2 + since:2020-10-31 until:2020-11-03 -filter:links -filter:replies').get_items()):
        if i > maxTweets :
            break  
        csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()

@burakoğlakcı deneyiminizi paylaştığınız ve bizimle çalıştığınız için teşekkürler !! Gerçekten takdire şayan ve bana çok yardımcı oluyor. Eğer tweetleri enlem ve boylamlara göre almak istersek, ayrıca herhangi bir şehrin / ülkenin coğrafi konumunu twitter üzerinden nasıl bulabileceğimizi sormak istiyorum (snscraper kullanarak) sorgu dizesi ne olacak? Şimdiden teşekkürler 😃

Selam! Öncelikle tüm desteğiniz için çok minnettarım, teşekkür ederim. Yer kimliğiyle ilgili bir sorunum var. Arizona ve Florida yer kimliğine ihtiyacım var ama bulamıyorum. Biri bana bunları (ve başka yer kimliğini) nasıl alabileceğimi söyleyebilir mi, lütfen? Şimdiden teşekkürler ❤️

Arizona USA id: a612c69b44b2e5da

Florida USA id: 4ec01c9dbc693497 to find these IDs, you have to run geocode query on twitter. Ex. geocode:34.684879,-111.699645,1mi this cooordinates allow you to search for a point location in Arizona. you can use any map service to access coordinates. then click on the content of a tweet that appears as a result of this query. you will see arizona, USA as the place name on this tweet content, if not, review another tweet. after clicking on the place name, you will see the place ID on the link in the search bar.

Anyone have a tip for getting all the tweets in an individual’s timeline? Have managed to get user tweets (thank you @burakoglakci for your example) but would like to get the tweets the user retweets as well (tweet.retweetedTweet didn’t get it). And for any other noobish coders out there, just in case this helps.

import snscrape.modules.twitter as sntwitter

maxTweets = 10

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@TwitterSupport').get_items()) :
        if i > maxTweets :
            break  
        print(f"the date is {tweet.date}")  
        print(f"the user name is {tweet.user.username}")
        print(f"the tweet content is {tweet.content}")
        print(f"the tweet rendered content is {tweet.renderedContent}")
        print(f"the outlinks are {tweet.outlinks}")  
        print(f"the tco outlinks are {tweet.tcooutlinks}") 
        print(f"the url is {tweet.url}")
        print(f"the retweeted tweet is  {tweet.retweetedTweet}")        
        print(f"the quoted tweet is  {tweet.quotedTweet}") 

@Woolwit Thanks for share the more attributes of a tweets. Kindly also share the code/qurey of that how we can get the no.likes, no.retweets, no.comments. Thanks in advance.

Anyone have a tip for getting all the tweets in an individual’s timeline? Have managed to get user tweets (thank you @burakoglakci for your example) but would like to get the tweets the user retweets as well (tweet.retweetedTweet didn’t get it). And for any other noobish coders out there, just in case this helps.

import snscrape.modules.twitter as sntwitter

maxTweets = 10

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@TwitterSupport').get_items()) :
        if i > maxTweets :
            break  
        print(f"the date is {tweet.date}")  
        print(f"the user name is {tweet.user.username}")
        print(f"the tweet content is {tweet.content}")
        print(f"the tweet rendered content is {tweet.renderedContent}")
        print(f"the outlinks are {tweet.outlinks}")  
        print(f"the tco outlinks are {tweet.tcooutlinks}") 
        print(f"the url is {tweet.url}")
        print(f"the retweeted tweet is  {tweet.retweetedTweet}")        
        print(f"the quoted tweet is  {tweet.quotedTweet}") 

@sunyoid I think your query column is changed use this colums. for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@Sunyoid4 + since:2015-12-02 until:2020-11-05-filter:links -filter:replies').get_items()): if i > maxTweets : break

I keep running into this problem everytime i run this code: import snscrape.modules.twitter as sntwitter import csv maxTweets = 20

csvFile = open('test5.csv', 'a', newline='', encoding='utf8')

csvWriter = csv.writer(csvFile) csvWriter.writerow(['id','date','tweet',])

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:SADWRIST-filter:links -filter:replies').get_items()): if i > maxTweets : break

csvWriter.writerow([tweet.id, tweet.date, tweet.content]) csvFile.close()

it saves the csv file of that user profile, but when i open the csv file which(only has 0kb). and the error code is: NameError Traceback (most recent call last) <ipython-input-2-23f7456de87e> in <module> 11 if i > maxTweets : break 12 ---> 13 csvWriter.writerow([tweet.id, tweet.date, tweet.content]) 14 csvFile.close() 15

NameError: name 'tweet' is not defined

In the final code i posted before it wasn’t working, it only scraped one tweet out of the profile. Anyone have a solution to this?

Yes it is a matter of indents, happened to me as well. When you have “if i> maxTweets:” - that needs to be in an indent. “Break” as well. “CsvWriter” needs to be aligned with the 'if i>maxTweets". the ‘csvFile.close()’ is outside of the if and needs to be aligned with the “for i, tweet in enumerate…”.

When it comes to the scraping of likes/retweets, I did not find any easy way to do it with snscrape. I have useed tweepy. Here is the link I have followed: https://medium.com/@jcldinco/downloading-historical-tweets-using-tweet-ids-via-snscrape-and-tweepy-5f4ecbf19032

Note, you need to request the Twitter Developer role because you need all the keys.

Hope it helps!

Merhaba! Son snscrape sorgusunu kullanıyorum, ancak benim için çalışmıyor. @JoeBiden’ı 2020-01-01’den kullanıyorum ve sadece 1 tweet ile garip bir çıktı alıyorum. Varsa bir mac kullanıcısıyım. Gerçekten neler olduğunu bilmiyorum. Kelimenin tam anlamıyla kodu kopyalayıp yapıştırıyorum ve tanıtıcıyı değiştiriyorum ama çalışmıyor. Herhangi bir ipucu? Çok teşekkür ederim!

Hi!

I using python 3.8.6 when I run this query

('from:@JoeBiden + since:2020-01-01 until:2020-11-10 -filter:links -filter:replies').get_items()) :

I’ve collected 901 tweets.

Hello! I am using the last snscrape query, but it is not working for me. I am using @joebiden from 2020-01-01 and I am getting a weird output with just 1 tweet. I am a mac user, if any. I really do not know what is going on. I literally copy-paste the code and change the handle but it does not work. Any hints? Thank you so much!

@sbif

Hi guys! I’m totally lost: how can I use snscrape to extract tweet from a user in a specific time lapse? I’m a beginner with Python, I have to do this for my thesis: It’s three weeks I’m trying to extract this data without success, I tried with tweepy and than with GetOldTweets3 and I’ve just discovered about this new TwitterApi limit… Can somebody help me please?

Use this query with snscrape:

import snscrape.modules.twitter as sntwitter import csv maxTweets = 3000

csvFile = open(‘place_result.csv’, ‘a’, newline=‘’, encoding=‘utf8’)

csvWriter = csv.writer(csvFile) csvWriter.writerow([‘id’,‘date’,‘tweet’,])

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(‘from:@BillGates + since:2015-12-02 until:2020-11-05-filter:links -filter:replies’).get_items()): if i > maxTweets : break csvWriter.writerow([tweet.id, tweet.date, tweet.content]) csvFile.close()

@burakoglakci You can please help me with the querie to get tweets of a specific user?

@bensilver95 @Niehaus

Absolutely, our queries are working. The codes I added in the previous post were not displayed correctly. If you want to add a location filter to your query,

keyword = ‘covid’

keyword + ’ place:095534ad3107e0e6 + since:2020-10-20 until:2020-11-04 -filter:links -filter:replies’).get_items()):

you can run this query, with this query, you can collect shared tweets about covid from the state of Kentucky. Querying on shorter date ranges, as with GOT, can yield better results. Because in queries where there are too many tweets, twitter can stop responding.

@TamiresMonteiroCD @WelXingz @ahsanspark @Atoxal @SophieChowZZY

I think I solved the problem. I made a few changes to the lines. I collect tweets using a word and location filter. I’m using Python 3.8.6 on Windows 10 and it works fine right now.

import snscrape.modules.twitter as sntwitter
import csv
maxTweets = 3000

#keyword = 'deprem'
#place = '5e02a0f0d91c76d2' #This geo_place string corresponds to İstanbul, Turkey on twitter.

#keyword = 'covid'
#place = '01fbe706f872cb32' This geo_place string corresponds to Washington DC on twitter.

#Open/create a file to append data to
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('deprem + place:5e02a0f0d91c76d2 + since:2020-10-31 until:2020-11-03 -filter:links -filter:replies').get_items()):
        if i > maxTweets :
            break  
        csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()

Edit: Esqueci de dizer isso. Às vezes, o aplicativo me dá um 400: Bad Request, eu o executo novamente e ele produz o HTML como disse antes.

This flashing seems to be related to the random choice of user agent in TweetManager.py where “user_agent = random.choice (TweetManager.user_agents …”. I believe that a loop scanning the user agent list with exception handling solves this problem.

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

I’ve tried to run this code with python 3.8.6 on windows 10 and it didn’t give me any result, it makes no erros but i end up with a empty csv (only with the headers), is there something that i might be missing?

Not sure why, but I had the same problem. I replace tweet.renderedContent by tweet.content and it works !

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

May I ask what if I want to filter the language of the tweet (e.g. only tweet in English)? How can I add the filter for that?

add “lang:en” without quotes inside query string example: for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'lang:en').get_items()) :

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

this is when you are trying to filter by providing two dates, but how do you get all tweets? just by removing the filter criteria?

Yes, you can add or remove filters as per your need.

People here that have been using snscrape, can you post any code examples just doing a simple query search in script and not console? The lack of documentation is making this more trial and error as I learn the modules.

import snscrape.modules.twitter as sntwitter
for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2015-12-17 until:2020-09-25').get_items()) :
        if i > maxTweets :
            break
        print(tweet.username)
        print(tweet.renderedContent)

nscrape twitter-search "from:barackobama since:2015-09-10 until:2015-09-12” > baracktweets.txt

Can you please tell me how to get tweets with multiple keywords in search query like “Jobs AND (unemployment OR government)” @ppival

Hi @ppival @shelu16 , thanks for the snscrape reference. I tried it and the twitter-search module works, but it only gives me the list of tweet url, e.g: https://twitter.com/irwanOyong/status/1309516653386842113

Tried the --jsonl and --with-entity but it failed. Any insight to get the item (tweet) details?

@irwanOyong I was having the same issue, the reason is I wasn’t using the development version of snscrape. Be sure to install it with pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git

Once I did that it worked like @ppival said it should.

Any alternative solution for it? My masters thesis is on hold because of it.

What an excellent opportunity to write a chapter about politics of APIs in the context of research! 😅 Your supervisor will have references for literature I am sure (and depending on your field), but you can look at publications from the Digital Methods Initiative at the University of Amsterdam, including people like Anne Helmond.

Found this in issues for Twint: https://github.com/twintproject/twint/pull/917#issuecomment-697361036

The trick is to encapsulate the call to the scraper in a loop, and then each time, decrement the c.until. I’m using something like this:

for x in range(0,number_of_skips): days = x * -7 end = start + timedelta(days) time.sleep(10) scrape(str(end))

The time.sleep (using the time module; in this case, ten seconds) helps avoid getting blocked on the Twitter end. The number_of_skips is a function created by the encapsulating program to determine the length of time I want to scrape in days , then dividing it by the number of days (in this case, a week).

“scrape” is just

def scrape(u_date): u_date += " 00:00:00" c = twint.Config() c.Search = st c.Store_object = True c.Limit = 40 c.Until = u_date c.Lang = “fi” twint.run.Search© tlist = c.search_tweet_list (and then print, store, whatever; and that’s it for the loop)

Worked for me

this was in issues for 'taspinar/twitterscraper ’ which also stopped working recently: image

https://github.com/taspinar/twitterscraper/issues/344

Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html

Unfortunately twitter api does not fully meet our need, because we need to full history search without any limitations. You can search only 5000 tweet in a month with twitter api. I hope getoldtweets start to work as soon as possible, otherwise i can not complete my master thesis

I see! I’m fairly new to scrapping, but I’m working on a end of course thesis about sentiment analysis and could really use some newer tweets to help me out.

I’ve been tinkering with GOT3’s code a bit and got it to read the HTML of the search timeline, however it’s mostly unformatted. Like I said, I have little experience with scrapping so I’m really struggling to format it correctly. However, I will note my changes, for reference and for someone with more experience to pick-up if they so wish:

  • updated user_agents (updated with the ones used by TWINT);

  • updated endpoint (/search?)

  • some updates to the URL structure:

      url = "https://twitter.com/search?"

        

        url += ("q=%%20%s&src=typd%s"
                "&include_available_features=1&include_entities=1&max_position=%s"
                "&reset_error_state=false")

        if not tweetCriteria.topTweets:
            url += "&f=live"`

Edit: Forgot to say this. Sometimes the application gives me a 400: Bad Request, I run it again, and it outputs the HTML like said before.

I’m not sure if it is related to this issue, but some of the user_agents seem to be out of date

I forked and created a branch to allow a user-specified UA, using samples from my current browser doesn’t fix the problem.

I notice the search and referrer URL shown in--debug output (https://twitter.com/i/search/timeline) returns a 404 error:

$ GetOldTweets3 --username twitter --debug 
/home/inactivist/.local/bin/GetOldTweets3 --username twitter --debug
GetOldTweets3 0.0.11
Downloading tweets...
https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=from%3Atwitter&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Host: twitter.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
Referer: https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=from%3Atwitter&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Connection: keep-alive
An error occured during an HTTP request: HTTP Error 404: Not Found
Try to open in browser: https://twitter.com/search?q=%20from%3Atwitter&src=typd
$ curl -I https://twitter.com/i/search/timeline
HTTP/2 404 
[snip]

EDIT The url used for the internal search, and the one shown in the exception message, aren’t the same…