twarc: twarc2 search without configure on Windows throws JSON parse error

I ran the request below: twarc2 search ‘#ENDSARS-is:retweet’ --start-time 2017-12-01 --end-time 2020-11-30 --flatten --archive C:\Users\USER\Desktop\MyTwarcResults.json

and I got this error message below:

Traceback (most recent call last):
  File "C:\Users\USER\PycharmProjects\workspace\venv\Scripts\twarc2-script.py", line 33, in <module>
    sys.exit(load_entry_point('twarc==2.0.6', 'console_scripts', 'twarc2')())
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\click\decorators.py", line 33, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\twarc\decorators.py", line 172, in __call__
    result = e.response.json()
  File "c:\users\user\pycharmprojects\workspace\venv\lib\site-packages\requests\models.py", line 900, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\json\__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\USER\AppData\Local\Programs\Python\Python36\lib\json\decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

What exactly be the cause/source of this error, and how can i get help?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 63 (37 by maintainers)

Commits related to this issue

Most upvoted comments

This is a special appreciation to you both for your constant support and perseverance. I am so glad to inform you that my twarc2 now works and generates data. I can’t thank you enough sirs, as all your responses were useful. I sincerely appreciate your patience. For the records, I think the improper configuration of the twarc2 (in addition to the twarc configuration) contributed to the reasons for getting those errors.

I will begin to work on twarc2 for my archival data collection now. I will be so glad if you would come to my rescue if problems arise.

Thank you once again for the support.

On Wed, May 5, 2021 at 6:32 PM Ed Summers @.***> wrote:

Yeah, a 400 error from the API is documented https://developer.twitter.com/en/support/twitter-api/error-troubleshooting#http-status-codes as:

The request was invalid or cannot be otherwise served. An accompanying error message will explain further. Requests without authentication or with invalid query parameters are considered invalid and will yield this response.

and then suggest:

Double check the format of your JSON query. For example, if your rule contains double-quote characters associated with an exact-match or other operator, you may need to escape them using a backslash to distinguish them from the structure of the JSON format.

But we’re not actually sending any JSON as part of the search/recent API call it’s just a GET. I guess if we set logging.level to DEBUG we might get some underlying information https://stackoverflow.com/questions/16337511/log-all-requests-from-the-python-requests-module from requests/urllib3? --log-level might actually be a nice option to have…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DocNow/twarc/issues/441#issuecomment-832876616, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATRWO6HBBPD7HMHPABYHBIDTMF6KJANCNFSM43M324CA .

– Kingsley Oladayo Ogunne

Department of Corporate Services Obafemi Awolowo University Teaching Hospitals Complex P.M.B. 5538 Ile-Ife, Nigeria

Telephone: +2348088444325, +2349050054242

With @AbirRes’ help we were able to figure out that the bearer token was not persisted to the configuration file correctly. It was a ctrl-v character, which seemed to really confuse the Twitter API. I think the ctrl-v ended up in the configuration file because we were previously hiding the input of the token (for screen recording). It could be that some Windows terminals aren’t set up to do ctrl-v properly, and users could not see that it wasn’t working since it was hidden. Tokens should now appear in the console to help catch this in the future.

So if you have this problem, please make sure you are using twarc v2.1.5 or higher:

pip install --upgrade twarc

and then reconfigure twarc2:

twarc2 configure

Hopefully that will allow you to use twarc2 subcommands going forwards. Thanks for everyone’s patience on this!

@AbirRes what do you see when you run twarc2 search blm ?

I get the message: Unable to parse 400 error as JSON: Bad Request.

I am sorry, I can’t post a snapshot as I am not in front of my system right now.

@edsu I am using Windows 10 and python 3.9.5. I downloaded it from their official website. I also tried it after downloading Anaconda, where then I used the Anaconda prompt to run the commands. Furthermore, I followed the usual/suggested install methods and did not do anything custom to change the path, etc.

Hi @edsu, not sure if this thread is still running. I am facing a similar issue as @osemele, “unable to parse 400 error as json: Bad request” with twarc2. I have been able to successfully configure twarc2 as well as twarc, so the above-suggested fix does not work for me. twarc runs perfectly for me, but twarc2, unfortunately, does not. When I run the command: twarc2 stream blm > tweets.json1, it creates a file “tweets” but without any data. I have tried installing, uninstalling Anaconda, Python, etc., but unfortunately, nothing has worked so far. I also tried on a computer where the username does not have any space in it to avoid the pip breaking down, but that did not seem to be the problem as well. I am sorry for the long post, but I can’t seem to find the fix while twarc2 seems to do exactly what I need which is why I really want it to work. I would really appreciate any suggestions that you could kindly provide.

This also has me wondering if the input should actually display the keys on the console. It seems to be causing some confusion.

Do you find my suggestion on configuring twarc2 separately to avoid error useful?

On Mon, May 10, 2021, 11:39 AM Igor Brigadir @.***> wrote:

Do the old stand-alone apps have access to the Twitter v2 API?

Not by default, but, the same keys work for both v1.1 and v2, if the app is set up in a Project on the dashboard - https://developer.twitter.com/en/portal/projects-and-apps so we could link that in a “warning” when loading configs this way maybe?

“Profiles” sound like a good feature for sure.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DocNow/twarc/issues/441#issuecomment-836536018, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATRWO6ADVWNB5KTUBIH2N5LTM6ZXHANCNFSM43M324CA .

Yeah, that would be nice if it wasn’t too tricky. Do the old stand-alone apps have access to the Twitter v2 API? I guess it is confusing for someone might concurrently use twarc and twarc2. I wanted to update twarc2 to allow for “profiles” like twarc.

That’s very helpful thanks @osemele . We will test running twarc2 search without having run twarc2 configure first on Windows.

Yes, I think all this while I never knew I didn’t configure my twarc2 properly. First, I thought since I had configured twarc, that that would suffice for twarc2 Second, the error messages I was getting on twarc2 were not pointing towards issues of none/poor configuration. It never directly mentioned authorization as a problem, hence, my attention never went to configuration problems. Again, whenever I made attempts to configure my twarc2, it never displayed the bearer token, API secret and token secret on my screen while pasting it. So in most cases, I abruptly discontinued the process until I read somewhere that not displaying such secret keys and tokens was the normal process of configuring twarc2.

I think those getting similar errors to mine, especially when the python environment has properly been created should also look into their twarc2 configuration specifically.

Thank you once again.

On Fri, May 7, 2021 at 12:18 PM Ed Summers @.***> wrote:

@osemele https://github.com/osemele that is awesome news! Do you know what you did to fix it? It would be useful for us to know if this situation ever arises again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DocNow/twarc/issues/441#issuecomment-834281033, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATRWO6DVW5GJDOCLKPQUYZ3TMPEALANCNFSM43M324CA .

– Kingsley Oladayo Ogunne

Department of Corporate Services Obafemi Awolowo University Teaching Hospitals Complex P.M.B. 5538 Ile-Ife, Nigeria

Telephone: +2348088444325, +2349050054242

Building on that a bit more to test your Python environment you can run this little program after replacing CHANGEME with your Bearer Token?

https://gist.github.com/edsu/a1a86ff8398edaef3010e3453665e6d6

If that works then it must be something in twarc.

'#ENDSARS-is:retweet'

i think this query is missing a space, it should be "#ENDSARS -is:retweet"

Another issue may be the ' vs " quotes - so the full command that might work is:

twarc2 search --start-time "2017-12-01" --end-time "2020-11-30" --flatten --archive "#ENDSARS -is:retweet" "C:\Users\USER\Desktop\MyTwarcResults.json"

Does that give the same error?