twitterscraper: KeyError: 'items_html' when scraping
When I am scraping for tweets for a given day, twitterscraper stops scraping tweets for that day and returns the following error.
ERROR: An unknown error occurred! Returning tweets gathered so far.
Traceback (most recent call last):
File "/home/erb13020/PycharmProjects/untitled/venv/lib/python3.8/site-packages/twitterscraper/query.py", line 173, in query_tweets_once_generator
new_tweets, new_pos = query_single_page(query, lang, pos)
File "/home/erb13020/PycharmProjects/untitled/venv/lib/python3.8/site-packages/twitterscraper/query.py", line 100, in query_single_page
html = json_resp['items_html'] or ''
KeyError: 'items_html'
Sometimes it will gather up to 20,000 tweets for a certain query on a certain day. Sometimes it will stop at around 20 tweets. Here is my full output for scraping all tweets about ‘tesla’ on March 1, 2020.
INFO: {'User-Agent': 'Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko', 'X-Requested-With': 'XMLHttpRequest'}
Scraping tweets for 1/3/2020
INFO: queries: ['tesla since:2020-03-01 until:2020-03-02']
INFO: Querying tesla since:2020-03-01 until:2020-03-02
INFO: Scraping tweets from https://twitter.com/search?f=tweets&vertical=default&q=tesla%20since%3A2020-03-01%20until%3A2020-03-02&l=
INFO: Using proxy 119.2.54.204:31322
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-1234265965061386240-1234267078539898880&q=tesla%20since%3A2020-03-01%20until%3A2020-03-02&l=
INFO: Using proxy 128.199.214.87:3128
ERROR: An unknown error occurred! Returning tweets gathered so far.
Traceback (most recent call last):
File "/home/erb13020/PycharmProjects/untitled/venv/lib/python3.8/site-packages/twitterscraper/query.py", line 173, in query_tweets_once_generator
new_tweets, new_pos = query_single_page(query, lang, pos)
File "/home/erb13020/PycharmProjects/untitled/venv/lib/python3.8/site-packages/twitterscraper/query.py", line 100, in query_single_page
html = json_resp['items_html'] or ''
KeyError: 'items_html'
INFO: Got 18 tweets for tesla%20since%3A2020-03-01%20until%3A2020-03-02.
INFO: Got 18 tweets (18 new).
Scraped 13 tweets for 1/3/2020
Here is my code
HEADERS_LIST = [
'Mozilla/5.0 (Windows; U; Windows NT 6.1; x64; fr; rv:1.9.2.13) Gecko/20101203 Firebird/3.6.13',
'Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko',
'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201',
'Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16',
'Mozilla/5.0 (Windows NT 5.2; RW; rv:7.0a1) Gecko/20091211 SeaMonkey/9.23a1pre'
]
twitterscraper.query.HEADER = {'User-Agent': random.choice(HEADERS_LIST), 'X-Requested-With': 'XMLHttpRequest'}
def scrape(d, m, y, query):
begin_date = dt.date(y, m, d)
end_date = begin_date + dt.timedelta(days=1)
tweets = query_tweets(query, begindate=begin_date, enddate=end_date)
df = pd.DataFrame(t.__dict__ for t in tweets)
print('Scraped ' + str(len(df)) + ' tweets for ' + str(d) + '/' + str(m) + '/' + str(y))
return df
I have tried looking around and played with different poolsizes but I’m not really sure what the issue is or where to start with fixing it. I am currently using version 1.5.0. Thank you!
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 6
- Comments: 19 (2 by maintainers)
I am also facing the same issue. Any updates on how to fix this?
@ashgreat Same for me, except I was able to get a lot more tweets about Tesla - somewhere in the tens of thousands. Its much better than version 1.5.0
After testing, I’m getting a similar but different error. Its scraping a lot more tweets than before, but I wanted to share what I’ve been getting with version 1.6.1
It doesn’t cut off scraping, but this repeats a lot when I run the scraper.